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THE C. ELEGANS qro-1 GENE 

RELATED APPLICATIONS 

This application is a continuation-in-part of 
PCT/CA98/00803 filed August 20, 1998, now at the 
5 national phase, and claiming priority on Canadian 
patent application serial number 2,210,251 filed August 
25, 1997, now abandoned. 

BACKGROUND OF THE INVENTION 

10 (a) Field of the Invention 

The invention relates to the identification of 
gro-1 gene and four other genes located within the same 
operon and to show that the gro-1 gene is involved in 
the control of a central physiological clock. 

15 (b) Description of Prior Art 

The gro-1 gene was originally defined by a 
spontaneous mutation isolated from of a Caenorha.bd.it is 
elegans strain that had recently been established from 
a wild isolate (J. Hodgkin and T. Doniach, Genetics 

20 146: 149-164 (1997)). We have shown that the activity 
of the gro-1 gene controls how fast the worms live and 
how soon they die. The time taken to progress through 
embryonic and post -embryonic development, as well as 
the life span of gro-1 mutants is increased (Lakowski 

25 and Hekimi, Science 272:1010-1013, (1996)). Further- 
more, these defects are maternally rescuable : when 
homozygous mutants {gro-l/gro-1) derive from a 
heterozygous mother (gro-l/+) , these animals appear to 
be phenotypically wild-type. The defects are seen only 

3 0 when homozygous mutants derive from a homozygous mother 
(Lakowski and Hekimi, Science 272:1010-1013, (1996)). 
In general, the properties of the gro-1 gene are simi- 
lar to those of three other genes, clk-1, elk- 2 and 
clk-3 (Wong et al . , Genetics 139: 1247-1259 (1995); 

35 Hekimi et al . , Genetics, 141: 1351-1367 (1995); 
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Lakowski and Hekimi , Science 272:1010-1013, (1996)), 
and this combination of phenotypes has been called the 
Clk ("clock") phenotype. All four of these genes 
interact to determine developmental rate and longevity 
5 in the nematode. Detailed examination of the clk-1 
mutant phenotype has led to the suggestion that there 
exists a central physiological clock which coordinates 
all or many aspects of cellular physiology, from cell 
division and growth to aging. All four genes have a 
10 similar phenotype and thus appear to impinge on this 
physiological clock . 

It would be highly desirable to be provided with 
the molecular identity of the gro-1 gene. 

15 SUMMARY OF THE INVENTION 

One aim of the present invention is to provide 
the molecular identity of the gro-1 gene and four other 
genes located within the same operon. 

In accordance with the present invention there 

2 0 is provided a gro-1 gene which has a function at the 

level of cellular physiology involved in developmental 
rate and longevity, wherein gro-1 is located within an 
operon and gro-1 mutants have a longer life and a 
altered cellular metabolism relative to the wild-type. 
25 In accordance with a preferred embodiment, the 

gro-1 gene of the present invention codes for a GRO-1 
protein having the amino acid sequence set forth in 
Figs. 3A-3B (SEQ ID. NO : 2 ) . 

The gro-1 gene is located within an operon which 

3 0 has the nucleotide sequence set forth in SEQ ID NO : 1 

and which also codes for four other genes, referred as 
gop-l f gop-2, gop-3 and hap-1 genes. 

In accordance with a preferred embodiment, the 
gop-1 gene of the present invention codes for a GOP-1 
35 protein having the amino acid sequence set forth in 
Figs. 13A-13C (SEQ ID. N0:4). 
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In accordance with a preferred embodiment, the 
gop-2 gene of the present invention codes for a GOP-2 
protein having the amino acid sequence set forth in 
Fig. 14 (SEQ ID. NO: 5) . 
5 In accordance with a preferred embodiment , the 

gop-3 gene of the present invention codes for a GOP-3 
protein having the amino acid sequence set forth in 
Figs. 15A-15B (SEQ ID. N0:6). 

In accordance with a preferred embodiment, the 
10 hap-1 gene of the present invention codes for a HAP-1 
protein having the amino acid sequence set forth in 
Fig. 16 (SEQ ID. NO: 7) . 

In accordance with a preferred embodiment of the 
present invention, the gro-1 gene is of human origin 
15 and has the nucleotide sequence set forth in Fig. 8 
(SEQ ID. NO: 3) . 

In accordance with a preferred embodiment of the 
present invention, there is provided a mutant GRO-1 
protein which has the amino acid sequence set forth in 
20 Fig. 3C. 

In accordance with the present invention there 
is also provided a GRO-1 protein which has a function 
at the level of cellular physiology involved in devel- 
opmental rate and longevity, wherein said GRO-1 protein 

25 is encoded by the gro-1 gene identified above. 

In accordance with a preferred embodiment of the 
present invention, there is provided a GRO-1 protein 
which has the amino acid sequence set forth in Figs. 
3A-3B (SEQ ID. NO : 2 ) . 

3 0 In accordance with a preferred embodiment of the 

present invention, there is provided a GOP-1 protein 
which has the amino acid sequence set forth in Figs. 
13A-13C (SEQ ID. NO : 4 ) . 

In accordance with a preferred embodiment of the 

35 present invention, there is provided a GOP-2 protein 



which has the amino acid sequence set forth in Fig. 14 
(SEQ ID. NO: 5) . 

In accordance with a preferred embodiment of the 
present invention, there is provided a GOP-3 protein 
which has the amino acid sequence set forth in Figs. 
15A-15B (SEQ ID. N0:6) . 

In accordance with a preferred embodiment of the 
present invention, there is provided a HAP-1 protein 
which has the amino acid sequence set forth in Fig. 16 
(SEQ ID. NO: 7) . 

In accordance with the present invention there 
is also provided a method for the diagnosis and/or 
prognosis of cancer in a patient, which comprises the 
steps of : 

a) obtaining a tissue sample from said patient; 

b) analyzing DNA of the obtained tissue sample of 
step a) to determine if the human gro-1 gene is 
altered, wherein alteration of the human gro-1 gene is 
indicative of cancer. 

In accordance with the present invention there 
is also provided a mouse model of aging and cancer, 
which comprises a gene knock-out of murine gene homolo- 
gous to gro-1 . 

In accordance with the present invention there 
is provided the use of compounds interfering with enzy- 
matic activity of GRO-1, GOP-1, GOP-2, GOP-3 or HAP-1 
for enhancing longevity of a host . 

In accordance with the present invention there 
is provided the use of compounds interfering with enzy- 
matic activity of GRO-1, GOP-1, GOP-2, GOP-3 or HAP-1 
for inhibiting tumorous growth. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1A illustrates the genetic mapping of 

gro - 1 ; 
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Fig. IB illustrates the physical map of the 
gro-1 region; 

Fig. 2A illustrates cosmid clones able to rescue 
the gro-1 (e24 00) mutant phenotype; 
5 Fig. 2B illustrates the genes predicted by 

Genefinder, the relevant restriction sites and the 
fragments used to subclone the region; 

Figs. 3A-3C illustrate the genomic sequence and 
translation of the C. elegans gro-1 gene (SEQ. ID. 
10 NO: 2) ; 

Fig. 3D illustrates the predicted mutant pro- 
tein; 

Fig. 4A illustrates the five genes of the gro-1 
operon (SEQ. ID. NO:l); 
15 Fig. 4B illustrates the transplicing pattern of 

the five genes of the gro-1 operon; 

Fig. 5A-5B illustrate the alignment of gro-1 
with the published sequences of the E. coll (P16384) 
and yeast (P07884) enzymes; 
20 Fig. 6 illustrates the biosynthetic step cata- 

lyzed by DMAPP transferase (MiaAp in E. coli , ModSp in 
S. cerevisiae, and GRO-1 in C. elegans) ; 

Fig. 7 illustrates the alignment of the pre- 
dicted HAP-1 amino acid sequence with homologues from 

2 5 other species ; 

Fig. 8 illustrates the full mRNA sequence of 
human homologue of gro-1 referred to as hgro-1 (SEQ. 
ID . NO : 3 ) ; 

Fig. 9A-9B illustrate a comparison of the 

3 0 conceptual amino acid sequences for GRO-1 and hgro-lp; 

Fig. 10 illustrates a conceptual translation of 
a partial sequence of the Drosophila homologue of gro-1 
(AA816785) ; 

Fig. 11A-11B illustrate the structure of pMQ8 ; 
35 Fig. 12 illustrates construction of pMQ18; 
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Figs. 13A-13E illustrate the genomic sequence 
and translation of the gop~l gene (SEQ . ID. N0:4); 

Fig. 14A-14B illustrate the genomic sequence and 
translation of the gop-2 gene (SEQ. ID. N0:5); 
5 Figs. 15A-15D illustrate the genomic sequence 

and translation of the gop-3 gene (SEQ. ID. N0:6); and 

Fig. 16A-16B illustrate the genomic sequence and 
translation of the hap-1 gene (SEQ. ID. N0:7). 

10 DETAILED DESCRIPTION OF THE INVENTION 

The crro-1 phenotype 

In addition to the previously documented pheno- 
types, we recently found that gro-1 mutants were tem- 

15 perature-sensitive for fertility. At 25°C the progeny 
of these mutants is reduced so much that a viable 
strain cannot be propagated. In contrast, gro-1 

strains can easily be propagated at 15 and 2 0°C. 

We also discovered that the gro-1 (e2400) muta- 

20 tion increases the incidence of spontaneous mutations. 
As gro-1 (e2400) was originally identified in a non- 
standard background (Hodgkin and Doniach, Genetics 146: 
149-164 (1997) ) , we first backcrossed the mutations 8 
times against N2 , the standard wild type strain. We 

2 5 then undertook to examine the gro-1 strain and N2 for 

the occurrence of spontaneous mutants which could be 
identified visually. We focused on the two class of 
mutants which are detected the most easily by simple 
visual inspection, uncoordinated mutants (Unc) and 

3 0 dumpy mutants (Dpy) . We examined 82 0 0 wild type worms 

and found no spontaneous visible mutant. By contrast, 
we found 6 spontaneous mutants among 12500 gro-1 
mutants examined. All mutants produced entirely mutant 
progeny indicating that they were homozygous. 



35 
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Sequences of all primers used 



Name 


Orientation 


Sequence (5'-3') 


SEQ ID NO: 


SHP91 


forward 


CGAACACTTTATATTTCTCG 


SEQ ID NO'8 


SHP92 


roup rep 

1 CVvl Ov 


GATAGTTCCCTTCGTTCGGG 


SEQ ID NO'9 


SHP93 


forward 


1 1 ICIGGAI ! 1 lAACC'l ICC 


SEQ. ID. NO:10 


SHP94 


forward 


TTTCCGAGAAGTCACGTTGG 


SEQ. ID. N0:11 




rpvPrco 
1 CV CI oc 


TAC AG G AATTTTTG AAC G GG 


SEQ ID NO*12 


SHP96 


forward 


CTTCAGATGACGTGGATTCC 


SEQ. ID. NO:13 


QUDQ7 


forward 

1 Ul VVGI U 


G G AATCCG AAAAAGTG AACT 


SEQ ID NO'14 


onr C/O 


fnr\/\/ard 
IUI well u 


AAGAGATACAOTOAATGGGG 


SEQ ID NO*15 


<^l-iPQQ 

OJlroo 


l CVCI oC 


ATCGATACCACCGTCTCTGG 


SEQ ID NO'16 


otir i uc7 




TTG A ATCTAC ACTAATC AC C 


SEQ ID NOM7 


onr i u\j 




( X iAA 1" 1 Al C I 1 \ I CCAG 1 CA 


SEQ ID NO'18 


onr i l u 


IUI Wclf U 


ArATTATAAAnTTAPTGTPP 

/"\V«/A\ J I J~\ 1 /"V/A/AVJ 1 1 r\V^/ 1 V_3 | V_/ V> 


qpq ID NO*19 


onr 1 i o 


IUI well u 


TTTTAf^TTAAAnnATTGACC 


SEQ ID NO 20 




I cvci oc 


ACATCTTTATCCATTTCTCC 


SEQ ID N0 21 


onr i 


iui well u 


Tf^rAAAGGPTnTGGAAnTCn 


SEQ ID NO'22 




1 CVCI oO 


AAAAACCACTTGATATAAGG 


SEQ ID NO-23 




1 CVCI oc 


CATCCAAAAGCAGTATCACC 


SEQ. ID. NO:24 


onr i o*+ 


IUI Wdl U 


TTAATTGGATGCAAGCACCCC 


SEQ ID NO*25 


onr i jo 


1 CVCI oc 


ATTACTATACGAACATTTCC 


SEQ (D NO*26 


onr i 


forward 

1 VJ 1 WCll vJ 


TTGTAAAGGCGTTAGTTTGG 


SEQ ID NO'27 


onr i oy 


frsr\Afarri 
IUI WcW U 


PAGGAGTATTTGGTGATGHG 


SEQ ID NO'28 


SHP140 


forward 


CGACGGGGAGAAGGTGACGG 


SEQ. ID. NO:29 


SHP141 


reverse 


AAAACTTCTACCAACAATGG 


SEQ. ID. NO:30 


SHP142 


reverse 


CGTAATCTCTCTCGATTAGC 


SEQ. ID. N0:31 


onr14o 


reverse 






SHP144 


reverse 


TGGATTTGTGGCACGAGCGG 


SEQ. ID. NO:33 


SHP145 


reverse 


TTGATTGCCTCTCCTCGTCC 


SEQ. ID. NO:34 


SHP146 


reverse 


ATCAACATCTGATTGATTCC 


SEQ. ID. NO:35 


SHP151 


forward 


CAGCGAG CG CATG CAACTATATATTG A 
GCAGG 


SEQ. ID. NO:36 


SHP159 


forward 


AATAAATATTTAAATATTCAGATATACC 
CTGAACTCTACAG 


SEQ. ID. NO:37 


SHP160 


reverse 


AAACTGTAGAGTTCAGGGTATATCTGA 
ATATTTAAATATTTATTC 


SEQ. ID. NO:38 


SHP161 


forward 


GTACGTGGAGCTCTGCAACTATATATT 
GAGCAGG 


SEQ. ID. NO:39 
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SHP162 


reverse 


ATGACACTGCAGGATAGTTCCCTTCGT 
TCGGG 


SEQ. ID. NO:40 


onr j uj 


fr^r\A/ cirri 
l\Jl well u 


C^TnTTnC AT PAC^TTPATTPP 


ccn in NCVA1 


O/ 1 1 l yt 


fnrwarrl 

1 kJl WCll \4 


d CTCTC PT A Pi A Af^TP AH, Af^ Q 


cpn m MO*49 

OL<j<. i l_/ . INw.tt 


Of IJ 1 U\J 


l Cvcl acf 


flTTPTP PTTf^ Pi A ATTP ATP P 




onr 1 1 u 




Af^T ATATPTAn ATf^Tf^ Pf5 APTPTPTf^ P 

CAATT 




SHP171 


reverse 


AGIAAI IG1ACAI 1 1AGIGG 


SEQ. ID. NO:45 


SHP172 


forward 


ATTAACCTTACTTACTTACC 


SEQ. ID. NO:46 


SHP173 


forward 


CTAAACTAAGTAATATAACC 


SEQ. ID. NO:47 


SHP174 


reverse 


GTTGATTCTTTGAGCACTGG 


SEQ. ID. NO:48 


SHP175 


forward 


AATTCGACCAATTACATTGG 


SEQ. ID. NO:49 


SHP176 


reverse 


AACATAGTTGTTGAGGAAGG 


SEQ. ID. NO:50 


SHP177 


forward 


AATTAATGGAGATTCTACGG 


SEQ. ID. NO:51 


SHP178 


forward 


TCAGCATCTAGAAATGCAGG 


SEQ. ID. NO:52 


SHP179 


reverse 


CG AATGTC AAC ATTC ACTG G 


SEQ. ID. NO:53 


SHP180 


forward 


CTTAACCTGATGTGTACTCG 


SEQ. ID. NO:54 


SHP181 


forward 


ATG AAG CTTT AG AG G ATG C C 


SEQ. ID. NO:55 


SHP182 


forward 


CGACGAATTTCTGGAGTCGG 


SEQ. ID. NO:56 


SHP183 


reverse 


ACTGCATTATCCATTAATCC 


SEQ. ID. NO:57 


SHP184 


reverse 


CACCCAAATAACATCTATCC 


SEQ. ID. NO:58 


SHP185 


forward 


TTTAACCTCATCTTCGCTGG 


SEQ. ID. NO:59 


SHP190 


forward 


ATGTTCCGCAAGCTTGGTTC 


SEQ. ID. NO:60 


SL1 


forward 


TTTAATTACCCAAGTTTGAG 


SEQ. ID. NO:61 


SL2 


forward 


TTTTAACCCAGTTACTCAAG 


SEQ. ID. NO;62 



Positional cloning of crro-1 

gro-1 lies on linkage group III, very close to 
the gene clk-1. To genetically order gro-1 with 
5 respect to clk-1 on the genetic map, 54 recombinants in 
the dpy-17 to lon-1 interval were selected from among 
the self progeny of a strain which was unc-79 (e!030) + 
+ clk-1 (e2519) lon-1 (e678) +/+ dpy-17 (el64) gro- 
l(e2400) + sma-4 (e729) . Three of these showed neither 
10 the Gro-1 nor the Clk-1 phenotypes, but carried unc-7S 
and sma-4, indicating that these recombination events 
had occurred between gro-1 and clk-1. From the dispo- 



sition of the markers, this showed that the gene order 
was dpy-17 gro-1 clk-1 lon-1, and the frequency of 
events indicated that the gro-1 to clk-1 distance was 
0.03 map units. In this region of the genome, this 
5 corresponds to a physical map distance of -2 0 kb . 

Several cosmids containing wild-type DNA span- 
ning this region of the genome were tested by microin- 
jection into gro-1 mutants for their ability to comple- 
ment the gro-1 (e2400) mutation (Fig. 1). grro-2 was 

10 mapped between dpy-17 and lon-1 on the third chromo- 
some, 0.03 m.u. to the left of clk-1 (Fig. 1A) . 

Based on the above genetic mapping, gro-1 was 
estimated to be approximately 2 0 kb to the left of clk- 
2. Eight cosmids (represented by medium bold lines) 

15 were selected as candidates for transformation rescue 
(Fig. IB). Those which were capable of rescuing the 
gro-1 (e2400) mutant phenotype are represented as heavy 
bold lines (Fig. IB) . 

Of these, only B0498, C34E10 and ZC395 were able 

20 to rescue the mutant phenotype. Transgenic animals 
were fully rescued for developmental speed. In 
addition, the transgenic DNA was able to recapitulate 
the maternal rescue seen with the wild- type gene, that 
is, mutants not carrying the transgenic DNA but derived 

25 from transgenic mothers display a wild type phenotype. 
The 7 kb region common to the three rescuing cosmids 
had been completely sequenced, and this sequence was 
publicly available . 

We generated subclones of ZC3 95 and assayed them 

30 for rescue (Fig. 2). The common 6.5 kb region is blown 
up in part B. B04 98 has not been sequenced and 
therefore its ends can not be positioned and are there- 
fore represented by arrows . 

One subclone pMQ2 , spanned 3.9 kb and was also 

3 5 able to completely rescue the growth rate defect and 



recapitulate the maternal effect. The sequences in 
pMQ2 potentially encodes two genes. However, a second 
subclone, pMQ3 , which contained only the first of the 
potential genes (named ZC395.7 in Fig. 2A) , was unable 
5 to rescue. 

Furthermore, frameshifts which would disrupt 
each of the two genes' coding sequences were con- 
structed in pMQ2 and tested for rescue. Disruption of 
the first gene (in pMQ4) did not eliminate rescuing 

10 ability, but disruption of the second gene (in pMQ5) 
did. This indicates that the gro-1 rescuing activity is 
provided by the second predicted gene. 

pMQ2 was generated by deleting a 29.9 kb Spel 
fragment from ZC395, leaving the left-most 3.9 kb 

15 region containing the predicted genes ZC395.7 and 
ZC395.6 (Fig. 2B) . pMQ3 was created in the same fash- 
ion, by deleting a 31.4 kb Ndel fragment from ZC3 95, 
leaving only ZC3 95.7 intact. In pMQ4 , a frameshift was 
induced in ZC395.7 by degrading the 4 bp overhang of 

20 the Apal site. A frameshift was also induced in pMQ5 
by filling in the 2 bp overhang of the Ndel site found 
in the second exon of ZC395.6. These frameshifts pre- 
sumably abolish any function of ZC395.7 and ZC395.6 
respectively. The dotted lines represent the extent of 

25 frameshift that resulted from these alterations. 

To establish the splicing pattern of this gene, 
cDNAs encompassing the 5 ' and 3 1 halves of the gene 
were produced by reverse transcription-PCR and 
sequenced (Fig. 3) . 

3 0 This revealed that the gene is composed of 9 

exons, spans -2 kb, and produces an mRNA of 1.3 kb. To 
confirm that this is indeed the gro-1 gene, genomic DNA 
was amplified by PCR from a strain containing the gro- 
l(e2400) mutation and the amplified product was 

35 sequenced. A lesion was found in the 5th exon, where a 
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9 base-pair sequence has been replaced by a 2 base-pair 
insertion, leading to a frameshift (Fig. 3C) . Fig. 3C 
illustrates those residues which differ from wild type 
are in bold. 

5 The reading frame continues out -of -frame for 

another 33 residues before terminating. 

Figs. 3A-B illustrate the coding sequence in 
capital letters, while the introns, and the untrans- 
lated and intergenic sequence are in lower case let- 

10 ters. The protein sequence is shown underneath the 
coding sequence. Position 1 of the nucleotide sequence 
is the first base after the SL2 trans- splice acceptor 
sequence. Position 1 of the protein sequence is the 
initiator methionine. All PCR primers used for genomic 

15 and cDNA amplification are represented by arrows. For 
primers extending downstream (arrows pointing right) 
the primer sequence corresponds exactly to the nucleo- 
tides over which the arrow extends. But for primers 
extending upstream (arrows pointing left) the primer 

2 0 sequence is actually the complement of the sequence 
under the arrow. In both cases the arrow head is at 
the 3' end of the primer. The sequence of the two 
primers which flank gro-1 (SHP93 and SHP92) are not 
represented in this figure. Their sequences are: SHP93 

2 5 TTTCTGGATTTTAACCTTCC (SEQ. ID. NO: 10) and SHP92 
GATAGTTCCCTTCGTTCGGG (SEQ. ID. NO: 9) . The wild type 
splicing pattern was determined by sequencing of the 
cDNA . Identification of the e2400 lesion was 

accomplished by sequencing the e2400 allele. The e2400 

30 lesion consists of a 9 bp deletion and a 2 bp insertion 
at position 1196, resulting in a frameshift. 
crro-1 is part of a complex operon (Figs. 3A-3B) 

Amplification of the 5 r end of gro-1 from cDNA 
occurred only when the trans- spliced leader SL2 was 

35 used as the 5 f primer, and not when SL1 was used. SL2 
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is used for trans-splicing to the downstream gene when 
two genes are organized into an operon (Spieth et al . , 
Cell 73: 521-532 (1993); Zorio et al . , Nature 372: 270- 
272 (1994)). This indicates that at least one gene 
5 upstream of gro-1 is co- transcribed with gro-1 from a 
common promoter. We found that sequences from the 5' 
end of the three next predicted genes upstream of gro-1 
(ZC395.7, C34E10.1, and C34E10.2) all could only be 
amplified with SL2 . Sequences from the fourth 

10 predicted upstream gene (C34E10.3), however, could be 
amplified with neither spliced leader, suggesting that 
it is not trans-spliced. The distance between genes in 
operons appear to have an upper limit (Spieth et al . , 
Cell 73: 521-532 (1993); Zorio et al . , Nature 372: 270- 

15 272 (1994)), and no gene is predicted to be close 
enough upstream of C34E10.3 or downstream of gro-1 to 
be co-transcribed with these genes. Our findings sug- 
gest therefore that gro-1 is the last gene in an operon 
of five co-transcribed genes (Fig. 4) . 

2 0 Nested PCR was used to amplify the 5' end of 

each gene. SL1 or SL2 specific primers were used in 
conjunction with a pair of gene-specific primers. cDNA 
generated by RT-PCR using mixed stage N2 RNA was used 
as template in the nested PCR. Fig. 4A illustrates a 
25 schematic of the gro-1 operon showing the coding 
sequences of each gene and the primers (represented by 
flags) used to establish the trans-splicing patterns. 

Fig. 4B illustrates the products of the PCR with 
SL1 and SL2 specific primers for each of the five 

3 0 genes. The sequences of the primers used are as fol- 



lows : SL1 : TTTAATTACCCAAGTTTGAG 


(SEQ. 


ID 


. NO 


61) , SL2 


TTTTAACCCAGTTACTCAAG 


(SEQ. 


ID. 


NO: 


62) , 


SHP141 


AAAACTTCTACCAACAATGG 


(SEQ. 


ID. 


NO: 


30) , 


SHP142 


CGTAATCTCTCTCGATTAGC 


(SEQ. 


ID. 


NO: 


31) , 


SHP143 


CCGTGGGATGGCTACTTGCC 


(SEQ. 


ID. 


NO: 


32) , 


SHP144 
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TGGATTTGTGGCACGAGCGG 


(SEQ. 


ID. 


NO: 


33) , 


SHP145 


TTGATTGCCTCTCCTCGTCC 


(SEQ. 


ID. 


NO: 


34) , 


SHP14 6 


ATC AAC AT C T GATT GAT T C C 


(SEQ. 


ID. 


NO: 


35) , 


SHP130 


CAT C C AAAAG C AGT AT C A C C 


(SEQ. 


ID. 


NO: 


24) , 


SHP119 


ACATCTTTATCCATTTCTCC 


(SEQ. 


ID. 


NO 


:21) , 


SHP95 


TACAGGAATTTTTGAACGGG 


(SEQ . 


ID. 


NO 


:12) , 


SHP99 


ATCGATACCACCGTCTCTGG 


(SEQ. ID. 


NO: 16) 









The gene immediately upstream of gro-1 f has 
homology to the yeast gene HAM1, and we have renamed 
the gene hap-1. We have established its splicing pat- 
tern by reverse transcription PCR and sequencing. This 
revealed that hap-1 is composed of 5 exons and produces 
an mRNA of 0.9 kb. We also found that sequences which 
were predicted to belong to ZC395.7 (now hap-1) are in 
fact spliced to the exons of C34E10.1. This is consis- 
tent with our finding that hap-1 is SL2 spliced as it 
puts the end of the C34E10.1 very close to the start of 
hap-1 (Fig. 4) . 
The gro-1 Qene product 

Conceptual translation of the gro-1 transcript 
indicated that it encodes a protein of 43 0 amino acids 
highly similar to a strongly conserved cellular enzyme: 
dimethylallyldiphosphate : tRNA dimethylallyltransf erase 
(DMAPP transferase) . Fig. 5 shows an alignment of gro- 
1 with the published sequences of the E. coli (P16384) 
and yeast (P07884) enzymes. Residues where the 

biochemical character of the amino acids is conserved 
are shown in bold. Identical amino acids are indicated 
further with a dot. The ATP/GTP binding site and the 
C2H2 zinc finger site are predicted and not 
experimental. The point at which the gro-1 (e2400) 
mutation alters the reading frame of the sequence is 
shown. The two alternative initiatior methionines in 
the yeast sequence, and the putative corresponding 
methionines in the worm sequence, are underlined. 
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Database searches also identified a homologous 
human expressed sequence tag (Genbank ID: Z40724) . The 
human clone has been used to derive a sequence tagged 
site (STS) . This means that the genetic and physical 
position of the human gro-1 homologue is known. It 
maps to chromosome 1, 122.8 cR from the top of Chr 1 
linkage group and between the markers D1S255 and 
D1S2861. This information was found in the UniGene 
database or the National Center for Biotechnology 
Information (NCBI) . We have sequenced Z4 0 724 by 

classical methods but found that Z4 0724 is not a full 
length cDNA clone as it does not contain an initiator 
methionine nor the poly A tail. We used the sequence of 
Z40724 to identify further clones by database searches. 
We found one clone (Genbank ID: AA3 32152) which 
extended the sequence 5' by 2 8 nucleotides, as well as 
one clone (Genebank ID: AA1214 65) which extended the 
sequence substantially in the 3' direction but didn't 
include the poly A tail. We then used AA121465 to 
identify an additional clone (AA8478 85) extending the 
sequence to the poly A tail. Fig. 8 shows the full 
sequence with the putative initiator ATG shown in bold 
and the sequence of Z60724 is shown underlined. A 
comparison of the conceptual amino acid sequences for 
GRO-1 and hgro-lp is shown in Fig. 9. Amino acid 
identities are indicated by a dot. Both sequences 
contain a region with a zinc finger motif which is 
shown underlined. 

An additional metazoan homologue is represented 
by Drosophila EST : Genbank accession: AA816785. In E . 
coli and other bacteria, the gene encoding DMAPP trans- 
ferase is called miaA (a.k.a trpX) and is called mod5 
in yeast. DMAPP transferase catalyzes the modification 
of adenosine 37 of tRNAs whose anticodon begins with U 
(Fig. 6) . 
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In these organisms the enzyme has been shown to 
use dimethylallyldiphosphate as a donor to generate 
dimethylallyl -adenosine (dma 6 A37) , one base 3' to the 
anticodon (for review and biochemical characterization 
5 of the bacterial enzyme see Persson et al . , Biochimie 
76: 1152-1160 (1994); Leung et al . , J Biol Chem 272: 
13073-13083 (1997) ; Moore and Poulter, Biochemistry 
36:604-614 (1997)). In earlier literature this modifi- 
cation is often referred to as isopentenyl adenosine 
10 (i 6 A37) . 

The high degree of conservation of the protein 
sequence between GRO-1 and DMAPP in S. cerevisiae and 
E. coli suggest that GRO-1 possesses the same enzymatic 
activity as the previously characterized genes. The 
15 sequence contains a number of conserved structural 
motifs (Fig. 5) , including a region with an ATP/GTP 
binding motif which is generally referred to as the 'A' 
consensus sequence (Walker et al., EMBO J 1: 945-951 
(1982)) or the 'P-loop' (Saraste et al . , Trends Biochem 
20 Sci 15: 430-434 (1990)). 

In addition, at the C-terminal end of the GRO-1 
sequence, there is a C2H2 zinc finger motif as defined 
by the PROSITE database. This type of DNA-binding 
motif is believed to bind nucleic acids (Klug and 
25 Rhodes, Trends Biochem Sci 12: 464-469 (1987)). 
Although there appears to be some conservation between 
the worm and yeast sequences in the C- terminus end of 
the protein (Fig. 5), including in the region encom- 
passing the zinc finger in GRO-1, the zinc finger motif 
3 0 per se is not conserved in yeast but is present in 

humans (Fig. 9) . 

In yeast DMAPP transferase is the product of the 
M0D5 gene, and exists in two forms: one form which is 
targeted principally to the mitochondria, and one form 
35 which is found in the cytoplasm and nucleus. These two 
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forms differ only by a short N-terminal sequence whose 
presence or absence is determined by differential 
translation initiation at two "in frame" ATG codons . 
(Gillman et al . , Mol & Cell Biol 11: 2382-90 (1991)). 
5 The gro-1 open reading frame also contains two ATG 
codons at comparable positions, with the coding 
sequence between the two codons constituting a plausi- 
ble mitochondrial sorting signal (Figs. 3 and 5) . It is 
likely therefore that DMAPP transferase in worms also 

10 exists in two forms, mitochondrial and cytoplasmic. 

It should be noted, however, that the sequence 
of hgro-1 shows only one in- frame methionine before the 
conserved ATP/GTP binding site (Fig. 9) . As we cannot 
be assured to have determined the sequence of the full 

15 length transcript, it is possible that further 5' 
sequence might reveal an additional methionine. 
Alternatively, in humans, the mechanism by which the 
enzyme is targeted to several compartments might not 
involved differential translation initiation. In this 

2 0 context, it should be noted that the sorting signals 
which can be predicted from the sequence of hgro-lp are 
predicted to be highly ambiguous by the prediction 
program PSORT II. Furthermore, a conceptual translation 
of the Drosophila sequence (AA816785) predicts only one 

2 5 initiator methionine before the ATP/GTP binding site as 

well as several in- frame stop codons upstream of this 
start (Fig. 10) , suggesting that no additional upstream 
ATG could serve as translation initiation site. In the 
figure, stop codons are indicated by stop, methionines 

3 0 are indicated by Met, and the conserved ATP/GTP binding 

site is underlined. 
Expression pattern of GRO-1 

We have also constructed a reporter gene 
expressing a fusion protein containing the entire GRO-1 
35 amino acid sequence fused at the C-terminal end to 
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green fluorescent protein (GFP) . The promotor of the 
reporter gene is the sequence upstream of gop-1 
(Figs. 13A-13C) , the first gene in the operon (see 
Fig. 4) . The promotor sequence is 3 06 bp long starting 
5 32 nucleotides upstream of the gop-1 ATG. It is fused 
at the exact level upstream of gro-1 where trans - 
splicing to SL2 normaly occurs. 

The genes gop-2 (Fig. 14) and gop-3 (Figs. 15A- 
15B) are also located in the operon (see Fig. 4), the 

10 second and third genes in the operon. 

We first construct the clone pMQ8 in which gro-1 
is directly under the promoter for the whole operon 
using the hybrid primers SHP160 (SEQ. ID. NO: 38) and 
SHP159 (SEQ. ID. NO: 37) and the flanking primers SHP161 

15 (SEQ. ID. NO:39) and SHP162 (SEQ. ID. NO:40) in 
sequential reactions each followed by purification of 
the products and finally cloning into pUC18 (Fig. 11) . 

Primers SHP151 (SEQ . ID. NO-.36) and SHP170 (SEQ . 
ID. NO: 44) where then used to amplify part of the 

20 insert in pMQ8 and clone in pPD95.77 (gift from Dr 
Andrew Fire) which was designed to allow a protein of 
interest to be transcriptionally fused to Green 
Fluorescent Protein (GFP) (Fig. 12) . 

The reporter construct fully rescues the 

25 phenotype of a gro-l(e2400) mutant upon injection and 
extrachromosomal array formation, indicating that the 
fusion to the GFP moiety does not significantly inhibit 
the function of GRO-1. Fluorescent microscopy indicated 
that gro-1 is expressed in most or all somatic cells. 

30 Furthermore, the GRO-1: :GFP fusion protein is localized 
in the mitochondria, in the cytoplasm as well as in the 
nucleus . 



The hap-1 gene product (Fig. 16) 

hap-1 is homologous to the yeast gene HAM1 as 
well as to sequences in many organisms including bacte- 
ria and mammals (Fig. 7) . 
5 The origin of the worm and yeast sequence is as 

described above and below. The human sequence was 
inferred from a cDNA sequence assembled from expressed 
sequence tags (ESTs) ; the accession numbers of the 
sequences used were: AA024489, AA024794, AA025334, 
10 AA026396, AA026452, AA026502, AA026503, AA026611, 
AA026723, AA035035, AA035523, AA047591, AA047599, 
AA056452, AA115232, AA115352, AA129022, AA129023, 
AA159841, AA160353, AA204926, AA226949, AA227197 and 
D20115. The E . coli sequence is a predicted gene 
15 (accession 1723866) . 

Mutations in HAM1 increase the sensitivity of 
yeast to the mutagenic compound 6 -N-hydroxylaminopurine 
(HAP) , but do not increase spontaneous mutation fre- 
quency (Nostov et al. y Yeast 12:17-29 (1996)). HAP is 

2 0 an analog of adenine and in vitro experiments suggest 

that the mechanism of HAP mutagenesis is its conversion 
to a deoxynucleoside triphosphate which is incorporated 
ambiguously for dATP and dGTP during DNA replication 
(Abdul-Masih and Bessman, J Biol Chem 261 (5) : 2020- 
25 2026 (1986)). The role of the Hamlp gene product in 
increasing sensitivity to HAP remains unclear. 
Explaining the pleiotropy of mia.A and qro-1 

Mutations in zrdaA, the bacterial homologue of 
gro-1, show multiple phenotypes and affect cellular 

3 0 growth in complex ways. For example, in Salmonella 

typhimurium, such mutations result in 1) a decreased 
efficacy of suppression by some suppressor tRNA 7 2) a 
slowing of ribosomal translation, 3) slow growth under 
various nutritional conditions, 4) altered regulation 
3 5 of several amino acid biosynthetic operons, 5) sensi- 



tivity to chemical oxidants and 6) temperature sensi- 
tivity for aerobic growth (Ericson and Bjork, J. Bacte- 
rid. 166: 1013-1021 (1986); Blum, J. Bacterid. 170: 
5125-5133 (1988)). Thus, MiaAp appears to be important 
in the regulation of multiple parallel processes of 
cellular physiology. Although we have not yet explored 
the cellular physiology of gro-1 mutants along the 
lines which have been pursued in bacteria, the appar- 
ently central role of miaA is consistent with our find- 
ings that gro-1 f and the other genes with a Clk pheno- 
type, regulate many disparate physiological and meta- 
bolic processes in C. elegans (Wong et al . , Genetics 
139: 1247-1259 (1995) ; Lakowski and Hekimi, Science 
272 : 1010-1013 (1996); Ewbank et al . , Science 275: 980- 
983 (1997) ) • 

In addition to the various phenotypes discussed 
above, miaA mutations increase the frequency of sponta- 
neous mutations (Connolly and Winkler, 

j Bacterid 173 (5) : 1711-21 (1991); Connolly and 
Winkler, J Bacterid 171: 3233-46 (1989)). As 
described in the previous section we have preliminary 
evidence that gro-1 (e2400) also increases the frequency 
of spontaneous mutations in worms. 

How can the alteration in the function of MDAPP 
transferase result in so many distinct phenotypes? 
Bacterial geneticists working with miaA have generally 
suggested that this enzyme and the tRNA modification it 
catalyzes have a regulatory function which is mediated 
through attenuation (e.g. Ericson and Bjork, J\ Bacte- 
rid. 166: 1013-1021 (1986)). Attenuation is a phe- 
nomenon by which the transcription of a gene is inter- 
rupted depending on the rate at which ribosomes can 
translate the nascent transcript. Ribosomal transla- 
tion is slowed in miaA mutants, and thus, through an 
effect on attenuation, could affect the expression of 
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many genes whose expression is regulated by attenu- 
ation . 

gro-l(e2400) also produces pleiotropic effects 
and, in addition, displays a maternal-ef f ect , suggest- 
ing that it is involved in a regulatory process (Wong 
et al., Genetics 139: 1247-1259 (1995). However, 
attenuation involves the co- transcript ional translation 
of nascent transcripts, which is not possible in 
eukaryotic cells were transcription and translation are 
spatially separated by the nuclear membrane. If the 
basis of the pleiotropy in miaA and gro-1 is the same, 
then a mechanism distinct from attenuation has to be 
involved. Below we argue that this mechanism could be 
the modification by DMAPP transferase of adenine resi- 
dues in DNA in addition to modification of tRNAs . 
A role for aro-1 in DNA modi fication? 

We observed that gro-1 can be rescued by 
maternal effect, so that adult worms homozygous for the 
mutation, but issued from mother carrying one wild type 
copy of the gene display a wild type phenotype, in 
spite of the fact that such adults are up to 1000 fold 
larger than the egg produced by their mother. It is 
unlikely that enough wild type product can be deposited 
by the mother in the egg to rescue a adult which is 
1000 times larger. This observation suggests therefore 
that gro-1 can induce an epigenetic state which is not 
altered by subsequent somatic growth. One of the best 
documented epigenetic mechanisms is imprinting in mam- 
mals (Lalande, Annu Rev Genet 30: 173-196 (1996)) which 
is believed to rely on the differential methylation of 
genes (Laird and Jaenisch, Annu Rev Genet 30: 441-464; 
Klein and Costa, Mutat Res 386: 103-105 (1997)). Modi- 
fication of bases in DNA have also been linked to regu- 
lation of gene expression in the protozoan Trypanosoma 
brucei. The presence of beta-D-glucosyl -hydroxy - 



a 
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methyl uracil in the long telomeric repeats of T. brucei 
correlates with the repression of surface antigen gene 
expression (Gommers-Ampt et al . , Cell 75: 112-1136 
(1993); van Leeuwen et al . , Nucleic Acids Res 24: 
5 2476-2482 (1996) ) . 

grro-2 and miaA increase the rate of spontaneous 
mutations, which is generally suggestive of a role in 
DNA metabolism, and can be related to the observation 
that methylation is linked to spontaneous mutagenesis, 

10 genome instability, and cancer (Jones and Gonzalgo, 
Proc. Natl. Acad. Sci . USA, 94: 2103-2105 (1997)). 

Does gro-1 have access to DNA? Studies with 
mod5 ( the yeast homologue of gro-1, have shown that 
there are two forms of ModBp, one is localized to the 

15 nucleus as well as to the cytoplasm, and the other form 
is localized to the mitochondria as well as the 
cytoplasm (Boguta et al . , Mol . Cell. Biol. 14: 22 98- 
2306 (1994) ) . The nuclear localization is striking as 
isopentenylation of nuclear-encoded tRNA is believed to 

2 0 occur exclusively in the cytoplasm (reviewed in Boguta 
et al., Mol. Cell. Biol. 14: 2298-2306 (1994)). 
Furthermore, studies of a gene mafl have shown that 
when mod5 is mislocalized to the nucleus, the 
efficiency of certain suppressor tRNA is decreased, an 

2 5 effect known to be linked to the absence of the tRNA 
modification (Murawski et al . , Acta Biochim. Pol. 41: 
441-448 (1994)). Finally, as described in the previous 
section, gro-1 contains a zinc finger, a nuclei acid 
binding motif. The zinc finger could bind tRNAs, but 

30 as it is in the C-terminal domain of gro-1 and human 
hgro-1 that has no equivalent in miaA, it is clearly 
not necessary for the basic enzymatic function. We 
speculate that it might be necessary to increase the 
specificity of DNA binding in the large metazoan 

35 genome. It should also be noticed that the second form 
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of Mod5p which is localized to mitochondria also has 
the opportunity to bind and possibly modify DNA as it 
has access to the mitochondrial genome. See the 
previous section entitled "A role for gro-1 in a 
5 central mechanism of physiological coordination" for an 
alternative possibility as to the function of GRO-1 in 
the nucleus. 

miaA and aro-1 are found in complex operons 

We have found that gro-1 is part of a complex 

10 operon of five genes (Fig. 4) . It is believed that 
genes are regulated coordinately by single promoters 
when they participate in a common function (Spieth et 
al., Cell 73: 521-532 (1993)). In some cases, this is 
well documented. For example, the proteins LIN-15A and 

15 LIN-15B which are both required for vulva formation in 
C. elegans, are unrelated products from two genes tran- 
scribed in a common operon (Huang et al . , Mol Biol Cell 
5 (4): 395-411 (1994)). One of the genes in the gro-1 
promoter is hap-1, whose yeast homologue has been shown 

20 to be involved in the control of mutagenesis (Nostov et 
al., Yeast 12: 17-29 (1996)). Under the hypothesis 
that gro-1 modifies DNA, it suggest an involvement of 
hap-1 in this or similar processes. The presence in 
the same operon also suggest that all five genes might 

25 collaborate in a common function. The phenotype of 
gro-1 suggests that this function is regulatory. In 
this context, it should be noted that miaA also is part 
of a particularly complex operon (Tsui and Winkler, 
Biochimie 76: 1168-1177 (1994)), although, except for 

3 0 miaA/gro-1 , there are no other homologous genes in the 
two operons . 

A role for aro-1 in a central mechanism of physiologi- 
cal coordination 

We have speculated that the genes with a Clk 
3 5 phenotype might participate in a central mechanism of 
physiological coordination, probably including the 
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regulation of energy metabolism. clk-1 encodes a 
mitochondrial protein (unpublished observations) , and 
its homologue in yeast has also been shown to be 
mitochondrial (Jonassen, T (1998) Journal of Biological 
5 Chemistry 273:3351-3357). The yeast clk-1 homologue is 
involved in the regulation of the biosynthesis of 
ubiquinone (Marbois, B.N. and Clarke, C.F. (1996) 
Journal of Biological Chemistry 271:2995-3004) . 
Ubiquinone, also called coenzyme Q, is central to the 

10 production of ATP in mitochondria. In worms, however, 
we have found that clk-1 is not strictly required for 
respiration. How might gro-1 fit into this picture? 

One link is that dimethylallyldiphosphate is 
known to be the precursor of the lipid side-chain of 

15 ubiquinone. In bacteria, ubiquinone is the major lipid 
made from DMAPP. In eukaryotes cholesterol and its 
derivatives are also made from DMAPP . Interestingly, 
C. elegans requires cholesterol in the growth medium 
for optimal growth. This link, however, remains tenu- 

2 0 ous, in particular in the absence of an understanding 

of the biochemical function of CLK-1. 

In several bacteria, the adenosine modification 
carried out by DMAPP transferase is only the first step 
in a series of further modification of this base 

25 (Persson et al . , Biochimie 76: 1152-1160 (1994)). 
These additional modifications have been proposed to 
play the role of a sensor for the metabolic state of 
the cell (Buck and Ames, Cell 36: 523-531 (1984); 
Persson and Bjork, J. Bacterid. 175: 7776-7785 

30 (1993)). For example, one of the subsequent steps, the 
synthesis of 2 -methylthio-cis-ribozeatin is carried 
out by a hydroxylase encoded by the gene miaE. When 
the cells lack miaE they become incapable of using 
intermediates of the citric acid cycle such as fumarate 

3 5 and malate as the sole carbon source. 



Another link to energy metabolism springs from 
the recent biochemical observations of Winkler and co- 
workers using purified DMAPP transferase (E. coli 
MiaAp) (Leung et al . , J Biol Chem 272: 13073-13083 
(1997) ) . These investigators observed that the enzyme 
in competitively inhibited by phosphate nucleotides 
such as ATP or GTP . Furthermore, using their estimation 
of K m of the enzyme and its concentration in the cell, 
they calculate that the level of inhibition of the 
enzyme in vivo, would exactly allow the enzyme to mod- 
ify all tRNAs but any further inhibition would leave 
unmodified tRNAs . This suggests that the exact level 
of modification of tRNA (or of DNA) could be exqui- 
sitely sensitive to the level of phosphate nucleotides. 
Superficially, this is consistent with the phenotypic 
observations. The state of mutant cells which lack 
DMAPP transferase entirely would be equivalent of cells 
where very high levels of ATP would completely inhibit 
the enzyme. Such cells might therefore turn down the 
ATP generating processes in response to the signal pro- 
vided by undermodif ied tRNAs (or DNA) . 

More generally, GRO-1 could act in the crosstalk 
between nuclear and mitochondrial genomes. The nuclear 
and mitochondrial genomes both contribute gene products 
to the mitochondrion energy-producing machinery and 
these physically separate genomes must therefore 
exchange information somehow to coordinate their 
contributions (reviewed in Poyton, R.O. and McEwen J.E. 
(1996) Annu. Rev. Biochem. 65:563-607). Furthermore, 
the energy producing activity of the mitochondria is 
essential to the rest of the cell, and the needs of a 
particular cell at a particular time must be somehow 
convey to the organelle to regulate its activity. GRO-1 
could participate in this coordination in the following 
manner. GRO-1 is found in three compartments, the 
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nucleus, the cytoplasm and the mitochondria (see 
above) , and thus has the opportunity to regulate gene 
expression in more that one way. How could its action 
coordinate gene expression between compartment? GRO-1 
5 could partition between the mitochondria and the 
nucleus and its relative distribution could be 
determined by the amount of RNA (or mtDNA) in the 
mitonchodria (Parikh, V.S. et al . (1987) Science 
235:576-580). For example, if the cell is rich in 

10 mitochondria, much GRO-1 will be bound there which 
could result in a relative depletion of activity in the 
cytoplasm with regulatory consequences on the 
translation machinery. Binding of GRO-1 in the nucleus 
could have similar consequences and provide information 

15 about nuclear gene expression to the translation 
machinery. 

While the invention has been described in con- 
nection with specific embodiments thereof, it will be 
understood that it is capable of further modifications 

2 0 and this application is intended to cover any varia- 
tions, uses, or adaptations of the invention following, 
in general, the principles of the invention and 
including such departures from the present disclosure 
as come within known or customary practice within the 

2 5 art to which the invention pertains and as may be 
applied to the essential features hereinbefore set 
forth, and as follows in the scope of the appended 
claims . 
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WHAT IS CLAIMED IS ; 

1. A gro-1 gene which has a function at the level 
of cellular physiology involved in developmental rate 
and longevity, wherein gro-1 mutations cause a longer 
life and an altered cellular metabolism relative to the 
wild-type, wherein gro-1 gene has the identifying 
characteristics of nucleotide sequence set forth in 
SEQ ID NO: 3 . 

2. The gro-1 gene of claim 1, which codes for a 
GRO-1 protein having the amino acid sequence set forth 
in Figs. 9A-9B as deduced from SEQ ID NO : 3 . 

3. A gro-1 co-expressed gene which comprises a 
gop~l gene which codes for a GOP-1 protein having the 
amino acid sequence set forth in Figs. 13A-13C (SEQ ID. 
NO: 4) ; wherein said gop-1 gene is located in the gro-1 
operon and said grop-1 gene is transcriptionally co- 
expressed with gro-1 gene present in said operon. 

4 . A gro-1 co-expressed gene which comprises a 

gop-2 gene which codes for a GOP- 2 protein having the 
amino acid sequence set forth in Figs. 14A-B (SEQ ID. 
N0:5); wherein said gop-2 gene is located in the gro-1 
operon and said gop-2 gene is transcriptionally co- 
expressed with gro-1 gene present in said operon. 

5. A gop-3 gene which codes for a GOP-3 protein 

having the amino acid sequence set forth in Figs. 15A- 
15B (SEQ ID. N0:6); wherein said gop-3 gene is located 
in the gro-1 operon and said gop-3 gene is 
transcriptionally co-expressed with gro-1 gene present 
in said operon. 
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6. A hap-1 gene which codes for a HAP-1 protein 
having the amino acid sequence set forth in Figs. 16A-B 
(SEQ ID. N0:7); wherein said hap-1 gene is located in 
the gro-1 operon and said hap-1 gene is 
transcriptionally co-expressed with gro-1 gene present 
in said operon. 

7. A GRO-1 protein which has a function at the 
level of cellular physiology involved in developmental 
rate and longevity, wherein said GRO-1 protein is 
encoded by the gene of claims 1 and 2 . 

8. A mutant GRO-1 protein which has the amino acid 
sequence set forth in Fig. 3D. 

9. A GRO-1 protein which has the amino acid 
sequence set forth in Figs. 3A-3C (SEQ ID. NO: 2) . 

10. A GRO-1 co-expressed protein which comprises a 
GOP-1 protein encoded by the gene according to claim 3 ; 
wherein said protein which has the amino acid sequence 
set forth in Figs. 13A-13C (SEQ ID. NO: 4) and human 
homolog thereof . 

11. A GRO-1 co-expressed protein which comprises a 
GOP-2 protein encoded by the gene according to claim 4; 
wherein said protein which has the amino acid sequence 
set forth in Fig. 14 (SEQ ID. NO: 5) and human homolog 
thereof . 

12. A GOP-3 protein encoded by the gene according to 
claim 5; wherein said protein which has the amino acid 
sequence set forth in Figs. 15A-15B (SEQ ID. NO: 6) and 
human homolog thereof. 
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13. A HAP-1 protein encoded by the gene according to 
claim 6; wherein said protein which has the amino acid 
sequence set forth in Fig. 16 (SEQ ID. N0:7). 

14. A method for the diagnosis and/or prognosis of 
cancer in a patient, which comprises the steps of: 

a) obtaining a tissue sample from said patient; 

b) analyzing DNA of the obtained tissue sample of 
step a) to determine if the human gro-1 gene is 
altered, wherein alteration of the human gro-1 gene is 
indicative of cancer. 

15. A mouse model of aging and cancer, which com- 
prises a gene knock-out of murine gene homologous to 
gro-1 gene of claims 1 and 2 . 

16. A method of regulating physiological processes 
of tissues, organs and/or whole organism of a host 
which comprises a compound interfering with enzymatic 
activity of GRO-1 of claim 7, 8 or 9 . 

17. A method of regulating physiological processes 
of tissues, organs and/or whole organism of a host 
which comprises a compound interfering with enzymatic 
activity of GOP-1 of claim 10. 

18. A method of regulating physiological processes 
of tissues, organs and/or whole organism of a host 
which comprises a compound interfering with enzymatic 
activity of GOP-2 of claim 11. 

19. A method of regulating physiological processes 
of tissues, organs and/or whole organism of a host 
which comprises a compound interfering with enzymatic 
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activity of GOP-3 of claim 12. 

20. A method of regulating physiological processes 

of tissues, organs and/or whole organism of a host 
which comprises a compound interfering with enzymatic 
activity of HAP-1 of claim 13. 
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ABSTRACT OF THE INVENTION 

The invention relates to the identification of 
gro-1 gene and to demonstrate that the gro-1 gene is 
involved in the control of a central physiological 
clock. Also disclosed are four other genes located 
within the same operon as the gro-1 gene. 
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CD 
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SL2 MIFRKFLNFLKPYKMR 16 



aaaatatcgtcaggaaataataacatttcagatataccctgaactctacagtttATGmTTCAGGmTTTCTGMTTTTCTGAMCCTTACAAJiraC 1394 



T D P I I F V I G C T G T G K S D L G V A I A K K Y G G E V I S V 49 

GMCGGATCCGATTATTTTCGTGATO 1494 

T SHP109 



D S N Q F Y K G L D I A T I K I T 66 

MKM mmAT 1594 

EEESEGIQHHMMSFLfiPSESSSYNVHSFREVTL 99 



SHP94 



D L I K KIRARSKIPVIVG 116 

GATCTTATTMgtgcttaattcgccactttttgaacttgaM^ 1194 



GTTYYAESVLYENNLIETNTSDDVDSKSRTSSE 149 

GAffiraCTTATTATGCTGfflGTGTCCTTTATGAGMTMTCTGATTGMCCMCACTM 1894 

SHP96 * 



SSSEDTEEGISNQELWDELKKIDEKSALLLHPN 182 
ATCGTCATCTGAAGACACWMIMTAGTMm^ 1994 



po-1 continued,,, 

l R Y R V 0 R A L Q I F R E T G 

MTCGTTATCGAGmCAGAGAGCATTGCAAATTTTCAGAGAMCTGgtaattgatttgcaaatttccagattaaaaacaaatcaagtaaagttttttgca 



138 
2094 



I R K S E L V E K Q K S D E T V D L G 6 R L R F D N S L V I F « D 231 

gGMTCCGiyy\MGTGMCTO 2194 

ATPEVLEERLDGRVDKMIKLGLKNELIEFYHE 263 

ATGCAACACCTGMGTTTTAGMGAMGACTTGATGGMGAGTTGJTAM 2294 

aaatatttgaatttttccagaaaaaaaaagaaaattttttattattttgtttttttttcattctttactattttccaaaaaagtttaaacttttgaaaac 2394 

B A E Y 267 

tgttcagaaaatgttcgtgtatttattttagcttactgaggcattatttcattgtgatttttactatactctataaactaaattttcagCACGCCGAGTA 2494 

INHSKYGVMQCIGLKEFVPWLNLDPSBRDTLNG 300 

CATMTCACAGCAMTATGGTGTTO 2594 
^-CG^oOtesion ™ * 

D R L F K Q G CDDVKLHTRQY 318 

GATAAATTGTTCMGCMGGgtaatttaaatttattttcaatttttataaattccaagctattttcagATGCGATGRTGTGMGCTTCACACTCGACMT 2694 



iro-1 continued,.. 5/32 

ARRQRRWYRSRLLKRSDGDR 33 
ATGCACGGCGCCAGAGACGGTGGTATCGATCGftGACTTTTAAAACGGTCGGATGGTGATCGGgtatgttgattttaaaaaaattgaatttttaaagaact 279 
! SHP99 



tttttactaaattaacaaagttattggctgaaaatggctgaaaattatagtaaaactaatcaaaaaaattgaaattttgaattaaagtcataaagtgacg 289 

KMASTKMLD 34 

accagaaaattaaaaaaaaacatttttctattttaattaattcactctacttcactttaaaaataattttcagAAAATGGCAAGTACAAAAATGCTGGAT 299 



T S D K I R I I S D G M D I V D fi » M H G I D L F E D 37 

ACATCTGACMGTACCGMTMTTAGTGATGGMTGGACATTGTTGATCMTGGATGMTGGMTCGATCTATTTGMGATgtaaaatttcacaaattCt 309 

ISTDTNPILKGSDANILLNCEI 39 

aaaatttccgaatcacaaattaaaatttctacagATCTCCACAGACACCAATCCMTTCTAAAAGGGTCCGATGCmTATTCTGCTGAATTGTGAAATC 319 

CNISHTGKDNW QKEIDGKK 41 

TGTMTATTTCAATGACTGGAAMGATAATTGgtttgtttcaatacatattataatttcgaaatgaattttttcagGCAGAMCATATCGATGGGAAAM 329 
SHP110 T T SHP100 

HKHHAKQKKLAETRTi 43 

GCACMGCATCATGCTAAGCAAAAGAAATTGGCAGAGACTCGCACAtaagacgctatatttattttttgttaacttaaattatttttgttgttgattgtt 339 

poiyA 

ctctaaataaaaaaacagctcagagagaa^aggcgctcgtccacatctccgacgatagtcaacccgaacgaagggaactatctttaattgtcagtga 349 

' SHP92 
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tgatttttactatactctataaactaaattttcagCACGCCGAGTACATAAATCACAGCAAATATGGTGTCACG 1197 

HAEYINHSKYGVT 276 

TTfjGTCTTMMTTCGTTCCA^ 1272 

I. V L K H S F H G S I I I HQ K K I H S M G I N C 301 



TCMGCAAGGgtaatttaaatttattttcaatttttataaattccaagctattttcagATGCGAfGATGtgaagcttc 1350 
S 8 I D A H H • 308 
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m 
II! 

w 
m 

□ 
III 
III 

m 
o 
Cl 



925 bp - 
421 bp 



gop-1 gop-2 gop-3 hap-1 gro-1 

Oi CN <~ CN *- CN I «- • CN 
i ll I | I I — I — I — 1 
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Sequence of GRO-1 and homologues 



Celegans i MIFRRFLNFLKPYKMRTDPIIFVI6CTGT6KSDLGVAIAKKYGGEVISVDSMQFYKGLDIATNKITEEESEGIQ 
S.cerevisiae 1 MLKGPLRGCLM1SKKVI VI AGTTGVG KSQLSI QLAQKFNGE VINSDSMQVYKDI P 1 1 TNKHPLQEREGI P 
Uo H i MSDISRASLPKAIFLMGPTASGKTALAIELRKILPVELISVDSALIYKGMDIGTAKPNAEELLAAP 

ATP/GTP 
binding site 



Celegw 16 HHMSFLNPSESSSYNVHSFREVTLDLIKKIRARSKIPVIVGGTTYYAESVLYENNLIETNTSDDVDSKSRTSSE 
S.cerevisiae 72 HVMKHVDWSE--EYYSHRFETECMNAIEDIHRRGKIPIWGGTHYYLQTLFNRRVDTKSSERKLTRKQLDILES 
l.coli 68 RLLDIRDPSQ— AYSAADFRRDALAEHADITAAGRIPLLVGGTMLYFKALLEGLSPLPSADPEVRARIEQQAAE 



I I I I I I III 



C.elejans 151 SSEDTEEGISNQELWDELKKIDEKSALLLHPHNRYRVQRALQIFRETGIRKSELVEKQKSDETVDLGGRLRFDN 

S.cerevisiae hi DPDV IYKTLVKCDPDIATKYHPNDYRRVQRMLEIYYKTGKKPSETFKEQK ITLKFD- 1 

sM hjGWES LHRQLQEVDPVAAARIHPNDPQRLSRALEVFFISGKTLTELTQTSG DALPYQV 
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iii i i i i 

CeUgm 226 LVIFMDATPEVLEERLDGRVDKMIKL6LKNELIEFYKEHAEYINHSKYGVMQCIGLKEFVPWLNLDPSERDTLN 
S.csrevisiae 205 LFLWLYSKPEPLFQRLDDRVDDMLERGALQEIKQLYEYYSQMFTPEQCENGVWQVIGFKEFLPWLTGKTDDNT 
I.coli 202 QFAIAPASRELLHQRIEQRFHQMLAS6FEAEVRALFARGDLHTDLPSIRCVGYRQMWSYLEGEISYDEMVYRGV 



III I II I 

I 

C.alagus mi DKLFKQGCDDVKLHTRQYARRQRRWYRSRLLKRSDGDRKMASTKMLDTSDKYRIISDGKDIVDQWMNGIDLFED 
S.cewisiu 280 KLEDCIERMKT--RTRQYAKRQVKWIKKMLIPDIKGDILLDATDLSQWDTNASQRAIAISNDFISNRPIKQERA 
I.coli m ATRQLAKRQITWLRGWEGVHWLDSERPEQARDEVLQVVGAIAG 



C.ehgms 
S.ceieusi&s 



. C2H2 zinc finger . 

376 STDTNPILKGSDANILLN CEICUISMTGKD10RHIDGRKH KHHARQKKLATRT 

353 KALEELLSKGETTMKKLDDWTHYTRNVCRNADGKNWAIGEKYWKIHLGSRRHKSNLKRNTRQADFEKWKINKK 



n/32 




12/32 

Sequence of HAH and its homologies 



fl. sapiens 
C. eleps 
S, cemiske 
I coli 



•Mill , 

MAASLVGKKIVFVT6NAKKLEEWQILGDKFP CTLVAQKIDLPEYXG-EPDEI SIQKCQE 

MLYILWKLNYLQKKMSLRKINFVTGNVKKLEEVKAILKNFE VSNVDVDLDEFQG-EPEFIAERRCRE 

MSNNEIVPVTGNANKLKEVQSILTQEVDNNNKTIHLINEA1DLEELQDTDLNAIALAKGKQ 
MQKWLATGNVGKVRELASLLSDFGLD IVAQTDLGVDSAEETGLTFIENAILKA 



H, sapiens 
C. eleps 
S. cerensiae 
L coli 



• • • Ml | , 

AVRQV-QG-PVLVEDTCLCFNALGXLPGPYIKWFL--EKLKPEGLHQLLAGFED KSAYALCTFALSTGDP 

AVEAV-KG-PVLVEDTSLCFNAMGGLPGPYIKWFL— KNLKPEGLHNMLAGFSD KTAYAQCIFAYTEG-L 

AVAALGKGK?VFVEDTALRFDEFNGLPGAYIKWFL— KSMGLEKIVKMLEPFEN K1EAVTTICFADSRG 

RHAAKVTALPAIADDSGLAVDVLGGAPGIYSARYSGEDATDQKNLQKLLETMKDVPDDQRQARFHCVLVYLRHAE 



• • • • Ml t 



I sapiens 
C, 



II I Ml • 



SQPVRLFRGRTSGRIV-APRGCQDFGWDPCFQP-DGYEQTYAEHPKAEKNAVSHRFRALLELQEYFGSLAA 
GKPIHVFAGKCPGQIV-APRGDTAFGWDPCFQP-DGFKETFGEMDKDVKNEISHRAKALELLKEYFQKN 
S, cerevism E YHFFQGITRGKIV-PSRGPTTFGWDSIFEPFDSHGLTYAEHSKDAKNAISHRGKAFAQFKEYLYQNDF 
I coli DPTPLVCHGSWPGVITREPAGTGGFGYDPIFFV-PSEGKTAAELTREEKSAISHRGQALKLLLDALRNG 
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mRNA sequence of human homologue of gro-1: hgro-1 



CTGCCATAAG 


ATGGCGTCCG 


TGGCGGCTGC 


ACGAGCAGTT 


CCTGTGGGCA 


GTGGGCTCAG 


GGGCCTGCAA 


CGGACCCTAC 


CTCTTGTAGT 


GATTCTCGGG 


GCCACGGGCA 


CCGGCAAATC 


CACGCTGGCG 


TTGCAGCTAG 


GCCAGCGGCT 


CGGCGGTGAG 


ATCGTCAGCG 


CTGACTCCAT 


GCAGGTCTAT 


GAAGGCCTAG 


ACATCATCAC 


CAACAAGGTT 


TCTGCCCAAG 


AGCAGAGAAT 


CTGCCGGCAC 


CACATGATCA 


GCTTTGTGGA 


TCCTCTTGTG 


ACCAATTACA 


CAGTGGTGGA 


CTTCAGAAAT 


AGAGCAACTG 


CTCTGATTGA 


AGATATATTT 


GCCCGAGACA 


AAATTCCTAT 


TGTTGTGGGA 


GGAACCAATT 


ATTACATTGA 


ATCTCTGCTC 


TGGAAAGTTC 


TTGTCAATAC 


CAAGCCCCAG 


GAGATGGGCA 


CTGAGAAAGT 


GATTGACCGA 


AAAGTGGAGC 


TTGAAAAGGA 


GGATGGTCTT 


GTACTTPAP A 


AACGCCTAAG 


CCAGGTGGAC 


CCAGAAATGG 


CTGCCAAGCT 


GCATPPATAT 


GACAAACGCA 


AAGTGGCCAG 


GAGCTTGCAA 


GTTTTTGAAG 


Ai-ii-tri. ^ xi \J VJI^lxT. X 


CTCTCATAGT 


GAATTTCTCC 


ATCGTCAACA 


TACGGAAGAA 


GGTGGTGGTP 


CCCTTGGAGG 


TCCTCTGAAG 


TTCTCTAACC 


CTTGCATCCT 


TTGGCTTTAT 


GCTGACCAGG 


CAGTTCTAGA 


TGAGCGCTTG 


GATAAGAGGG 


TGGATGAPAT 


GCTTGCTGCT 


GGGCTCTTGG 


AGGAACTAAG 


AGATTTTCAC 


AGACGGTATA 


ATCAGAAGAA 


TGTTTCGGAA 


AATAGCCAGG 


ACTATCAACA 




CAATCAATTG 


GCTTCAAGGA 


ATTTCACGAG 


TACCTGATCA 


CTGAGGGAAA 


ATGCACACTG 


GAGACTAGTA 


ACCAGCTTCT 


AAAGAAAGGA 


CCTGGTCCCA 


TTGTCCCCCC 


TGTCTATGGC 


TTAGAGGTAT 


CTGATGTCTC 


GAAGTGGGAG 


GAGTCTGTTC 


TTGAACCTGC 


TCTTGAAATC 


GTGCAAAGTT 


TCATCCAGGG 


CCACAAGCCT 


ACAGCCACTC 


CAATAAAGAT 


GCCATACAAT 


GAAGCTGAGA 


ACAAGAGAAG 


TTATCACCTG 


TGTGACCTCT 


GTGATCGAAT 


CATCATTGGG 


GATCGCGAAT 


GGGCAGCGCA 


CATAAAATCC 


AAATCCCACT 


TGAACCAACT 


GAAGAAAAGA 


AGAAGATTGG 


ACTCAGATGC 


TGTCAACACC 


ATAGAAAGTC 


AGAGTGTTTC 


CCCAGACTAT 


AACAAAGAAC 


CTAAAGGGAA 


GGGATCCCCA 


GGGCAGAATG 


ATCAAGAGCT 


GAAATGCAGC 


GTTTAAGAGA 


CATGTCCAGT 


GGCCTTTGGA 


AAGGTGGTGG 


GGATCCAGTT 


CAGGAGGGAG 


GGGTATGTTT 



GTCTCCCAGT CTGGGCAAAG GAGTGCTATG CGGAATTCTC TGCATAGCAG 
AAAAGCTCCC ACCATTTTCT TTTGATGTGG TTTTAAAGTC TCACGTTCTC 
TATAATAGAA ACAGCAGGTC TTGTCAGCTC CTTGTGTGGC TGATGTGTCT 
GGAAATGATG TAGTTCAGGA AAGCATTTTT TTTTTCTTTG AACCTTAAAG 
GTTCTATTAT TAAAAGCAGC ACAGATTCCA CATTTTTATA CATGAGGATC 
TTCTTTGTGG TGAATACCAG GATTGACTGC ATCCCTTTAA AAGAAGTTTT 
ATGTCCCTGA CTCTGGCTAA AATTATCTAA TTTCCAGATG CTTTTGTAGA 
TGACTGAAGT ATTTGTGAGC CACATATTGG GAGTTCTAGA TTTGAGTGAA 
TGGCAGGAAA GGGCCATCTC CATTGAGATG ATTAAGTGAA CCAAACTAGT 
TCTCGGAATT CTACAGAGAA GGAGGGAATC AGACTGAGGA AGCTGTGACA 
TAGGACTTGA AGACCAAAGA CTTTGAAATT TGCGAGCTGC TCATGTGTGA 
GTTATTATCA CTGCTGTCTT TCTATTGAGT TACAAATCTA TATTTTTATT 
GAAGTTTAAA TAAAGAAAAA ATTTACAAGA AAAAAAAAAA A 
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GRO-1 and its human homoloque haro-1p 



" • • MMM % M| | 1 1 1 1 | MM M 

hgro-lp ®SVMM^ 
SD-1 MKFLNF^^ 



* I Ml M • t I M M I M Ml MM M M t 

hgro-lp KVSAPRIMSFW 

(30-1 nTEEESEGIQHHMMSFD^SESSSYMSFREVTLDLIKK 



• M I I I I IM I I M I MM 

hp-ip TRPQEffiTEMDimEKEDGLV — m^mmmmmsgss^L 

GBD-1 EMSDDVDSMSSESSSMEGISN[M 
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M • ft Ml I I I M MM IM I M M I 

SEFL1QHTETO 



t 9 I M Ml I II 



SMSQDYQHGIFQSIGFKEFHEYLITEGKCTLETSNQLLKRGPGPIVPPVYGLE 

YfflSKY-lMPGLKEFVMPSm 



M I 



VSDVMESVLEPmeSn^imTPIMmElfRSMr 

RSDGCRKMASTKMLDT^^ 



II I I I I I I I 



0)miH(MMKSm 

C2H2 zinc finger 
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Structure of pMQ8 



Sad 
gtacgtg [gagctcj- 



SHP161 

-► 



atcgtgttccaggtgcjaactatatattgagcaggaggacgagttgtttgtttcatgctgcttaaaaataaaaatg 

. J SHP151 * 
cagcgagclgca^i 

Sphl 

gaaaattgagtcaaaaagttgagataaaacaaattaaaacaattttctgaaaaataaacaactgaaatttgaagtaataaacaacacgcgaaaacgttat 



ttcggagcatcgtttgagaagtaaaactttttttcggcgcacccttgtgcgcagtttttatcttctcttttaatttaattttcaagctaaatctttcttt 



proioter ■ 



ttaaactttq 



SHP160 



gro-I 



SHP159 MIFRKFLNFLKPYKMR 



.^aataaatatttaaatattcag atataccctgaactctacagtttATGATATTCAGGAMTTTCTGAATTTTCTGARACCTTACAAMTGC 



T D P I I F V I G C T G T G K S D L G V I I A K K Y G G E V I S V 
GAACGGATCCGATTATTTTCGTGATTGGGTGCACTGGAACCGGGAAAAGTGATCTTGGAGTGGCAATTGCAAAGAAATATGGAGGAGAGGTGATTAGTGT 
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DSMQFYKG L D I I T H . . . 
AGATTCMTGCAATTTTATAMGgtacatgggttttgtttcaattttaaattaattaattttcgtttttcagGACTTGACATTGCCACGAAT 



. . . HAKQKKLAETRT • 

taagacgctatatttattttttgttaacttaaattatttttgttgttgattgtt 

[tctaga] tatact 
Xbal 



; .CMGCTRfiGCAMAG RAATTGGCAGAGACTCGCACR' 

* SHP170 ' 



ctctaaataaaaaaacagctcagagagaagattaggcgctcgtccacatctccgacgatagtcaacccgaacgaagggaactatctttaattgtcagtga 



' SHP162 1^ 



[ctgcagltgtcat 
PstI 



ip 



ii p 
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Construction of pMQ1 8 



SHP151 



SHP170 



promoter 
Sphl 



pPD95.77 



A a a /\ a a SHP1515HP170 
J\=f \=!W\Md PCR product amplified 
gro-1 from pMQ8 




1 kb 



153- 



gfp 3 UTR 



GRO-1 GFP 



-C 



GRO-1 ::GFP Fusion Protein 



^ 200 aa 
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ml 



atcgtgttccaggtgcaactatatattgagcaggaggacgagttgtttgtttcatgctgcttaaaaataaaaatggaaaattgagtcaaaaagttgagat -9557 

aaaacaaattaaaacaattttctgaaaaataaacaactgaaatttgaagtaataaacaacacgcgaaaacgttatttcggagcatcgtttgagaagtaaa -9457 

actttttttcggcgcacccttgtgcgcagtttttatcttctcttttaatttaattttcaagctaaatctttctttttaaactttgaataaatatttaaat -9357 

MFRKLGSSGSLWKPKNPHSLE 21 

attcagaatgcaccaataaacctggaacaaaatcgata ftTGTTCCGCMGCnGGTTCT TCTGGGTCACTATGGMGCCGAAflMTCCGCATTCTTTGGA -9257 

Y L K Y L Q G V L T K N E K V T E N N K K I L V E A L R A I A E I 54 

ATACCTCAAATATTTACMGGAGTGTO -9157 

LIfJGDQNDASVFD F F L E R 72 

CTCATTTGGGGCGATCAGAATGATGCTTCGGTTTTTGAgtgagtttttttccaatgttttttttcaaatctgatgttgaatttcagTTTCTTCCTTGAGC -9057 

QMLLYFLKIMEQGKTPLNVQLLQTLNILFENIR 105 

GGCMTGCTTCTHATTO^ -8957 

T SHP171 

H E T S L Y FLLSNNHVNSII 123 

ACATGAAACTTCACTTTgtaagttttttatatggattttcgcttaaaattgccagttttcagATTTCCTTCTAAGTAACAATCATGTAAACTCGATTATT -8857 



S fl K F D L Q N D E I M A I Y I S F L K T L S F K L N P A T I B F F 157 
TCCCACAMTTCGATTTACAAAATGATGAGATCATGGCTTACTACATTAGTTTTCTGAAMCTCTTTCATTTAAACTGAATCCAXTACMTCCACTTCT -8757 



jf-i continued.., 21/32 

F I E T T E E F P L L V E V L K L Y N W N E S H ? R I A V R I I L 190 

TCTTCMTGAAACGRCTGMGRATTTCCATTGTTGGTAGMGTTTTGMGCTTTA TMTTGGMTGMTCMTGGT TCGMTTGCTGTTAGAMTATTCT -8657 

SHP172 T 

L N I V R V Q D D S M I I F A I R H T K 210 

mAMTATTGTGAGAGTTCMGATGATTCAATGATTATTTTCGCTATCAAGCATACAAAAgttagtagaaaattattttgaaaaggtgtatttaagcaa -855) 



E y L S E L I D S L V G L S L E H D T F V R S A E N V L A N 240 

taaatattacagGMTATCTATCGGAGTTMTAGATTCTCTAGTTGGTCTCTCACTTGAAATGGACACATTTGTACGATCTGCTGAGAATGTGTTAGCTA -845? 

R E R L R G X V D D L I D L I B Y I G E L L D V E A V A E S L S I 213 

ATCGAGAGAGAnACGA GGAAAAGTGGATGATTTMT TGATTTGATTCAnATATTGGT GAACTATTGGATGTGGMGCTGTCGCCGAMGTTTATCAAT -8357 
SHP142 SHP173 T 

I V TTRYLSPLLLSSISPR 291 

TTTA(^tcagttttactgctggaaaatcaagtttttaatgttaaattttcagTMCMCACGATACTTMGCCCTCTATTACTTTCMGTATATCACCAA -8257 

RDNHSLLLTPISALFFFSEFLL 313 

GAAGAGATAATCATTCACTTCTACTCACTCCGATTTCTGCGTTATTTTTTTTCTCTGAATTTTTATTGgtgagttttaacatttaaaattacatttttct -8157 

I V R H B E T I Y T F L S S F I F D T Q H T L T ! H I I m 

aatttatttatttttcagATAGTTCGTCACCATGAAACAATATATACATTTTTATCATCTTTCCTATTTGACACTCAGAATACTTTGACGACCCATTGGA -8057 



RHNEKYCLEPITLSSPTGEYVNEDH 366 

TACGTCATMTGAGAMTATTGCTTAGAACCGATTACATTATCATCACCAACCGGAGAATATGTGAATGAAGACCAgtaagagctgaaattttaaaattt -7957 

VFFDFLLEAFDSSQADDSKAFYGLM 391 

ttgctttgaatatagtattttcagCGTATTTTTCGATTTTCTACTGGAAGCATTTGATTCCAGTCAAGCAGACGATTCGAAGGCATTCTATGGATTAATG -785? 



fop-l continued,,, 
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LIYSMFQNHA 401 

CTGATTTATTCAATGTTTCAGAATAATGgtgagttttaaaaaattgatttgttaaattaaaatttccatttccaataactcctcttcagacagtaagttt -7751 

tcaatgttgtaaagttcctgttcatctgtgatcgttttcttcatttttttagttttgcatgaacagttttcaaatttttttgatatcatacagtaaatat -7657 

cgtcatccagataattttctatttaaaaaaaatgaataaaaagagggcgcgcagaaattgccgaagtaatgtaaatttaaagggacacatgcgtagcttg -7557 

ttgtgtgggtctcgccgcgctttgtttgatttatcttgttttctgctcaaagagctgtttttattttagcgttgaatgcttttttaccgttctcatcggc -7457 

tttttaataggaatatttaaaaaaaaaggtttaataaatcttcgtttttacaaaatccatctaagatttgcatttgtgaagctcaacaagtaaagtttta -7357 

agtaacattgttttttaaaaaacaattgaaccaaattttgccgaaacattaataacatgacgatactctataaaatattcctcttttcaaaataaatttt -7257 

DVGELLSAANFPVLKESTTTSLAQQN 427 

caaaaaaaatccatttttcagCCGATGTTGGAGMCTTCTATCTGTO -7157 

1 SHP174 

L A R L R I A S T S S I S K R T R A I T E I G V E A T E E D E I F 480 

TCTTGCTCGTCTCCGMTAGCATCTACGTCTTCCATATCAMGCGMCGAGAGCTATCACTG AMTTGGAGTAGMGCGACC GAGGMGATGAGATTTTT -7057 

SHP185 J 

HDVPEEQTL 469 

CATGATGTTCCTGMGAACAMCGTTGgtaagtaaataaatcaacattgattgttacacaaactttaatatttttaaatttgaaaattttcttcaaagtg -6957 

EDLVDDVLVDTENSAISDPE 489 

ctcaaaaatcctgtcgaaaattacagGAAGATCTGGTGGATGATGTATTGGTTGATACTGAAAATTCAGCAATAAGTGATCCAGAAgtgagtagaaaacg -6857 

P K N V E S E S R 498 

tgcatgtattaattattaaaaaaaaaatatagttttccccagttttccttgacctaaaactcagcaatttcagCCTAAAAACGTGGAGTCAGAATCTCGT -6757 
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S B F Q S A V D E L P P P S T S G C D G R L F D A L S S I I I A V G 532 
TCTCGATTTCAATCTGCTGTTGRTGAGCTTCCACCTCCGTCGACTTCTGGATGTGATGGTCGACTTTTTGRTGCACTTTCATCGATTATCAAAGCAGTTG -6657 



TDDNRIRPITLELACLVIRQILMTVDDER 561 

GAACAGATGACAATCG AATTCGACCAATTACATTGGM CTTGCATGTCTTGTAA -6551 
SHP175 * 

VHTSLTKLCFEVRLKLLS 579 

aattcaaaattgagcaaaatcagaatM -6457 

SIGQMNGENLFLEWFEDEYAEFE 603 
CATCAATTGGACAATATGTTAATGGAGAGAATCTGTTTTTGGAGTGGTTTGAGGATGAATATGCAGMTTTGAAgtaagccaagaggtccgaaaataatt -6357 

VHHVNFDIIGHEMLLPPAATPLSNLLL 630 
taattcatcctttttattcagGTGAATCACGTGAATTTCGATATAATCGGTCACGAAATGCTTCTTCCTCCAGCTGCAACTCCTCTTTCGAATCTGCTAC -6257 

HRRLPSGFEERIRT Q I V 647 

TTCATMGCGATTGCCCAGTGGATTTGAAGAACGAATAAGAACTgtaggaaactttttaaatttgaaaattaattatatatatatttgcagCAAATCGTA -6157 

FYLHIRKLERDLTGEGDTEtPVRVLHSDQEPVAI 681 
TTCTACCTACATATTCGAAAATTGGAACGAGATTTGACCGGTGAAGGAGACACAGAATTACCTGTGAGAGTGTTGAATTCTGATCAGGAACCAGTTGCCA -6057 



G D C I N L H N S D L L S C T 696 

TCGGTGATTGTATTAATTTACgtgagttcatctgcatagaaaacaccatatttctactcaaattaacaattttcagATAATTCGGATCTTCTATCCTGCA -5957 

VVPQQLCSLGRPGDRLARFLVTDRLQLILVEPD 729 

CTGT GGTTCCTCMCAACTATGTTC TCTTGGAAMCCTGGTGATCGTCnGCTCGATTCCTTGTCACTGATAGACTTCAATTAATTCTTGTCGAACCGGA -5857 
' SHP176 

S R R A G B A I V R F V G L L Q D T T I N G D S T D S K V L H V V 762 

TTCTCGAAAAGCCGGATGGGCMTTGTTCGATTCGTAGGACTTCTTCMGATACAAC AATTAATGGAGATTCTACGGA TTCGAAAGTTTTGCATGTTGTG -5757 

SHP177 ? 

V E G Q P S R I R R R H P V L T A 779 

GTGGAAGGGCAACCCTCGAGAATTAAGgtaagaatactaacgggaaaaaaaaatcaaaaaattacttctgtttcagAAAAGACATCCGGTTTTAACTGCA -5657 
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AFIFDDHIRCMAAKQRLTK 198 
MGTTCATATTCGATGATCACATTCGGTGTATGGCAGCAMGCMCGGCTCACCMGgtaacggaaaaaataaccaaaaagacggaaagttattgtaaat -5557 



ggacgaaatcggcgaaattaattgaaaacgtttgaatttgccgctaaaaccaaacgaaaaccaaacgaaagcgaaatttaactatcccttcaggtagaat -5457 

G R Q T A R G L K L Q A I C S A L G V P R I D P A T 824 

atacattttatttctctttatagGGTCGCCAMCAGCACGTGGTCTGAAACTTCAGGCGATATGTTCAGCTCTTGGAGTTCCACGTATCGATCCAGCGAC -5357 

MTSSPRHNPFRIVKGCAPGSVRKTVSTSSSSSQ 857 

AATGACGTCATCACCACGAATGMTCCATTCAGAATTGTGAAAGGATGCGCACCGGGAAGTGTACGAAAAACTGTTTCCACATCATCATCGTCAAGCCAA -5257 



GRPGHYSANLRSASRNAGMIPDDPTQPSSSSERR 891 

GGACGTCCCGGACATTATTCTGCAAATCTTAG ATCAGCATCTAGAMTGCAGG AATGATACCAGATGATCCAACTCAACCGAGTAGTTCTTCGGAAAGAA -5157 

SHP178 J 

S . 892 

GATCCtagggatcaatatctcttcagtttcatcattttatgctgtaaattgtatttaagtattcctattctttgtagtactgtatttacacatcgtctag -5057 



ttaaaatcacaaatctccgaaaaaacaaaccagtgaacatgtgatatttctcttgcccatagttctcttttttttttgaaacaaaaacaattacttttat -4957 

A 

gctcacctattcgagccatatttttttcccaattaccggttgtttattttaatttcttttttttttctgtaaatctactttatttttaaaactgcatttg -4857 



agattgtgtatattttttcaaaatggttcaaatgccgaatctatctactt 



-4807 
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^ MAEKAENLPSSSAEASE 1 

tttaatcattattcaaacagaaaaaccgattatttattcagattctcaaaaATGGCTGAAmGCTGRAMTCTTCCATCTTCTTCGGCCGMGCTTCAG -410 

E P S P Q T G P N V H Q R P S I L V L G H A G S G K T T F V Q 4 

MGAGCCATCACCTCMACTGGACCAMTGTGMTCAMMCCATCGATTTTGGTTCTTGGAATGGCTGGTTCTGGAAAMCGACATTTGTTCAGgtaac -460 

R L T A F L H A R K T P P Y V I H L D P 6 

tttcattcaattttgagagttttcaaacattactattttcagCGTCTCACAGCATTCCTACATGCTCGTAAMCACCTCCATATGTGATTAATCTGGATC -450 

A V S K V P Y P V N V D I R D T V K Y R E V M K E F G H G P N G A 10 

CGGCAGTTAGCAMGTACCTTAT CCAGTGMTGTTGACftTTCGA GATACTGTGAMTACMGGARGTTATGAARGMTTCGGMTGGGACCAMTGGAGC -440 

T SHP179 

IMTCLNLMCTRFDKVIELINKRSSDFSVCLLDT 13 

AATTATGACATG TCTTAACCTGATGTGTACTCG TTTTGATAAAGTAATTGAGTTGATTAATAAGAGATCTTCTGATTTCTCAGTTTGTCTTCTTGATACT -430 

sis T 

P G Q I E A F T W S A S G S I I T D S L A S S R P T 16 

CCTGGACAAATTGAAGCRTTCACTTGGAGTGCTAGTGGATCTATTATCACTGATTCATT GGCAAGTAGCCATCCCACGgt aagggattttgatttatgaa -420 

T SHP143 

atctgcttgaaatgaaaaaagattctaataaatttttgacttttaaacattttttacagttatatttggtctattttctatcattaaaagcaaaatgaaa -410 



V V M Y I V D S A R A T H P f T F K S H 18 
aqtcqattctactccatatttattaatttcqacttttcaqGTG^TMTGTACATTGTGGATT CCGCTCGTGCCACAMTCCM CTACATTCATGTCCAAT -400 

T SHP144 
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l L Y A C S I L Y R T K L P F I V V F N K A D I V K P T F A L K H N 21 

ATGCTCTACGCATGTTCCATTCTCTACCGTACCAMCTTCCATTCATTGTCGTTTTCMCMAGCTGATATTGTCAAACCAACATTTGCACTCMATGGA -390 

QDFERFDEALEDARSSYMNDLSRSLSLVLDEFY 24 

TGCAAGATTTCGAMGATTTGATGAAGCTTTAGAGGATGCCAGMGCAGTTATATGMTGATTTGAGTCGTTCATTGAGTCTCGTTCTTGATGAATTCTA -380 

T 

SHP181 

C G L K T V CVSSATGEGFEDV 26 

TTGCGGACTGAAMCAGgtttttattcgaaataaaaccttttttaaataataaatttcagTTTGCGTCAGTTCTGCMCT(KA{aAGGATTCGAAGATGT -310 



M T A I D E S V E A Y K K E Y V P M Y E K V L A E K R L L D E E E 29 
MTGACAGCMTCGATGAMGTGTTGAAGCATACAAAAAAGAATATGTTCCAATGTATGAAAAAGTGTTGGCTGAGAAAAAACTATTGGATGAGGAGGAG -360 



R K R R D E E TLRGRAVRDLNKV 31 

AGMGAfflGAGATGMGAGgtaattgtagtaatttaattctgattatcttcaaattttcagACTCTGAMGGmGCTGTTCACGACCTGMCM -350 

ANPDEFLESELNSKIDRIHLGGVDEENEEDAEL 35 

TCGCCMTCC CGACGMTTTCTGGAGTCGG AGTTGMTTCAAAAATCGATAGAATTCATTTGGGCGGAGTCGATGAAGAGAATGAGGAGGATGCTGAACT -340 

SHP182 T 

E R S ' 35 

CGAAAGATCCtgattttctttttgtttttgaatttttattctattttgatccctgtttacttcttattgttctcattttgttgcgttgttttacatttta -330 



polyA 

ctcatttttgcataaacttgttgcaaaaataatataatttttgatctggaaatggttttaaaccttaacctttcatatattaataattttttttcaaaa -320 



aaacgttctaaaaaggttcctcattttttcaatataggaaattttgaaga -315 
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SL2 

"\ MSEKTFHK 8 

tcttttccaaaaatgaggttcttcgcttgaaaagccaacatttaaaacctttttttttccagaaacctagtggttaATGTCTGAAAAGACGTTCCACAAG -3057 

A Q T I I A I A S G V P S I V E A V Q F H G V R I T K N D A L V K E 42 
GCACAGACCATCCGTGCAMGGCATCCGGAGTGCCTTCAATCGTCGMGCTGTACAGTTTCATGGAGTTCGCATCACAAAAMCGATGCTTTGGTTAAGG -2957 

? S E L Y R 48 

AGgtactacccaaatttcaaaatgttgcacaattcaattgaaaatataaattgtgaattaaattcaacttacatgttttttcagGTTTCCGAATTATACA -285 

SKNLDELVHNSHLAARHLQEVGLMDHAVALIDT 81 
GAAGTAAAMTCTAGATGAACTTGTTCATMCTCTCATCTGGCGGCTCGTCATCTTCMGMGTT GGATTMTGGATMTGCAGTT GCTCTAATTGATAC -275 

T Si 

SPSSNEGYVVNFLVREPKSFTAGVKAGVSTNGD 114 
ATCTCCMGCTCAMTGMGGATATGTTGTCMTTTCCTAGTTCGAGAACCAAAATCATTCACTGCTGGAGTCAAAGCAGGAGTTTCAACGAATGGAGAT -26 

ADVSLNAGKQSVGGRGEAINTQYTYTVK 14 

GCGGATGTCAGTTTAAATGCCGGAAMCAAAGTGTTGGA G(^CGAGGAGAGGCMTCMT ACACAGTATACATATACTGTAMGgtaaggacgagagtt^ -255 

T SHP145 

gcactgccagtttggcatgttctcccaatattttttaattataaaatttggaagtataaaaaaatgtttgcttcatctaaaaatagcctttttcacatga -245 



aaaaaattgaaaaaaagtgctcaaaaatttcagaaatttccaatttccaaacaattttggagaactttcaaaaatttttccaactgaaattaaagctata -235 
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ttctatcactaaattttatacaagtcttaagagaaaatgatgaagtggctcattttgtagaatttcctaaaaaataatatcttcagGGCGATCACTGCTT -225 



N I S A I K P F L G W Q K Y S N V S A T L I R S L A H H P « N Q S 180 

C MCRTTTCCGCAATCRMCC ATTCCTGGGATGGCAAAMTATTCGAATGTATCAGCGACTCTATACCGTTCftCTTGCACftTATGCCA TGGAATCMTCR -215 
SHPi T ? SHP146 

DVDENAAVLAYNGQLWNQKLLHQVKLNA 208 

GATGTTGATGAGAATGCAGCTGTTCTTGCATATAATGGACAACTA'FGGMTCAAAAGCTTTTGCATCAAGTCAAATTGAATGCGgtaaagtattataagt -205 



I « R T L R A T R D A A F S V R E Q A G H T L 23 

gttttgtccaaactatgatacagttcttcagATATGGAGAACACTTCGTGCCACTCGAGATGCCGCATTTTCAGTTCGTGAACAAGCCGGACACACTTTG -195 

K F S L E S A V A V D T R D R P I L A S R G I L A 25 

AMTTCTCGTTGGAGMTGCTCTAGCTGTTGATACMGAGATAGACCTATTCTTGCMGTCGTGGAATTCTTGgtaagagtaacaacgactatttttaaa -185 



aaatatctttttcgaaaaaattacgaacgaaaaaaaactgtattatgtacccaaacgcgaaattttgcagttcttgcgcgttcttgttgataaaaaatat -175 

R F A Q 26 

gtaaaaaattggaaaaactacgaaaagtcgataaaaattccgtaccaaccggaaaatgtttcattaatttctcttccttttttcagCTCGTTTTGCTCAA -165 

EYAGVFGDASFVKNTLDLQ 279 

GAGTACG CAGGAGTATTTGGTGATGCGT CATTTGTGAAGAATACATTAGATTTACAGgtaacaaccttatttcaacaattatttcaaattCtattaaaaa -155 
SHP139 T 

A A A P L P L G F I L A A S F Q A K H L K G L G D R E V H I L 31 

taattccagGCA GCTGCCCCTCTTCCACTCGGT TTCATTCTTGCCGCCTCATTCCAAGCGAAACATTTGAAAGGACTCGGAGATCGAGAAGTTCATATTT -145 

SHP140 ' 
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DRClfLGGQQDVRGFGLNTIG 330 
TGGATAGATGTTftTTTGGGT GGACAACAGGATGTTCGAGGRTTTGGTCTGAATACTATTGGAgtgagttttaacgaaattctcttgaaagtcaaataatc -1357 

T SHP184 

V K A D N S C L G G G A S L A G V V B L Y R P L I P P 1 M L F 361 
attttcagGTTAAAGCAGATAACAGTTGTCTTGGAGGAGGTGCTTCACTTGCTGGTGTCGTTCATTTGTATCGGCCATTGATTCCACCAAATATGCTATT -1251 

A H A F L A S G S V A S V H S K N L V Q Q L Q D T Q R V S A G F G 394 
TGCACACGCATTCCTTGCATCT M -1151 

SHP163 } 

gagtttgaaatttaggaaacatttggatgaaatqtattttttaaaaatagatcagctttatttatttgaaaaaaaacgctcattaatcaatagtgatagt -1057 

tccattctgagtttcttcttcttcctcgcggaatacaatttttgacttgttcgcatccttcttgtgtactttgtcaccaatcttctcatcaactaaatct -957 

cgaaactgaaaaaatttcaaaattattccaaaaaatattgatgcagactacctttttgatggcttctggtacgtttctagcgtcgaatggattggctcct -857 



ccaataattaaagtctcgttcggtagtttagccagacggacggtgtgcttcaacatttttctaattaatctatttcaattcaagtcactcactctctctt -757 
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gacgtcttcttctatattccaagaactctgcagaaaatccgtgtccgccttgtgtgtttctagttggcgtcggaggattcacgggtccaagacgaatgga -657 

tgtctaaaaaatgttatatttttgcataaagaaaacaccataccttcaccactttttgagttgtgggcgttctgaatggaattgatcgattattattgct -551 

ctttcttgatttgcttctatcagctgcgtaatgaggtgttctaaagatcagctttaattcatttggacaagtgctcctctaataaacttaccctgtactc -451 

atttttgaaacgatttacgatgataagattgaaagtggaagttaaatttagtctttcaaagttgaaataaaatcttcataaataaataaatttaaatgaa -351 

L A F V F K S 401 

agattaaataaattaacgttcacgtagttaaaaaaataatttaaatcttaaacttctaataaaaaatctcaattttccagGACTCGCATTCGTGTTCAAAA -257 

IFRLELNYTYPLKYVLGDSIiIiGGFHIGAGVHFL 434 
GTATTTTCCGGCTGGAACTCMCTACACGTATCCATTGAMTATGTGCTCGGCGATTCATTGCTCGGTGGATTCCATATTGGAGCTGGTGTCMCTTCTT -151 

1 

Gtagagattaattggatgcaagcacccctcaaaaagatttttttgaaaaacgataaattcacagaatttcagttctttttctcccccttttattgttatt -57 
SHP134 ? 

ttcatcgtaa tgctgtgctagaagtcagag taaatatgagtttttttgtgttctaggaattccattttttcaggaagcaaatttaataaaaattatcgaa 44 
SHP164 f 

polyA 

r 

tttcttgctctaaagatgttgtacattttatggaaatgttcgtatagtaa 94 

T SHP135 



11 nii'p 
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SL2 

-\ MSLRKINFVTG 11 
ttcgaacactttatatttctcgttttaaaactgtcggtgttttatagtaaactatcttcagaaaaaaATGAGCCTACGA ARMTCMTTTCGTMCTGGA 194 

f SHP118 * 



NVKKLEEVKftlLKNFE 21 
AACGTGAAGAAGCTTGAAGMGTCAAGGCTATTTTGMGMTTTCGAGgtaaaatatatttgatattattcgaacgcgaaattttgcgccaaaagtacga 294 



tgcctggtctcaacacgacaatattttgttaaatacaaacgaatgtgcgccttcaaagaaaagtttcaatctttcgttgccgtggagatatttttagagt 394 



VSNVDVDLDEF 38 
ttttgtttaaattatatatttgtcgtatcgaaaccgggtaccgtaatcaatcaattaaatattttcagGTTTCAAACGTGGATGTCGATTTGGATGMTT 494 

SHP165 



QGEPEFIAERKCREAVEftVKGPVL 62 
CCAAGGAGAACCCGAATTTATTGCCGAAAGAAAGTGCCGTGAGGCTGTTGMGCTGTAAMGGGCCCGTTTTGgtatggaaaattgtatttgttctaaaa 594 



VEDTSLCFHAMGGLPGPYIKBFLKNLRPE 91 
attgtcaaatttcagGTCGMGACACMGTTTATGCTTCMCGCMTGGGCGGTCTTCCTGGA CCTTATATCMGTGGTTTTTG MGMTTTGAAACCAG 694 

SHP129 
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G L H N M L A GFSDKTAIAQCIF 111 

MGGACTACATMTATGCTAGgtaaatattttaattttttgaaaaaacttatttttcagCCGGATTTTCTGACAMACCGCCTATGCTCAATGCATCTTT 794 



AYTEGLGRPIHVFAG 126 
GCGTACACTGAAGGACTCGGAAAACCTATTCATGTATTTGCTGgtatgattttttgaatttaattctttaattttatatgttaatttagttgtttcattc 894 



KCPGQIVAPRGDTAFGWDP 145 
CtcaatttatgagagatttttttttcaatttttctatttcagGAAMTGTCCTGGTCAMTTGTTGCTCCACGT GGTGATACTGCTTTTGGATG GGATCC 994 

T SHP130 



CFQPDGFKETFGEMDKDVKNEISHRAKALELLK 178 
ATGCTTCCA(3CCAGATGOT 1094 

SHP119 



T SHP120 T 



E Y F Q N H • 184 
GMTATTTTCAGMTMTtaaattattttttctcatctatgcaatttcttgaaaatttgttaagtttccgttgttatgcatttgcttttatttaaaaaaa 1194 



polyA 

r 

aaagaatatttttacattaatattagatatgagaaaagagtaatttctggattttaaccttcctacaaaagaatatttatattttttgtatgatttttta 1294 



SHP93 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 
(i) APPLICANT: McGILL UNIVERSITY 
(ii) TITLE OF INVENTION: THE C. ELEGANS gro-1 GENE 
(iii) NUMBER OF SEQUENCES : 62 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SWABEY OGILVY RENAULT 

(B) STREET: 1981 McGill College Avenue - Suite 160 

(C) CITY: Montreal 

(D) STATE: QC 

(E) COUNTRY: Canada 

(F) ZIP : H3A 2Y3 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: Windows 

(D) SOFTWARE: FastSEQ for Windows Version 2.0b 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/ CA9 8/00803 

(B) FILING DATE: 20-AUG-1998 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: CA 2,210,2 51 

(B) FILING DATE: 25-AUG-1997 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Cote, France 

(B) REGISTRATION NUMBER: 4166 

(C) REFERENCE/DOCKET NUMBER: 17 7 0 - 17 9 "US " FC/gc 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 514 84 5-712 6 

(B) TELEFAX: 514 288-8389 

(C) TELEX: 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14458 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

GCAAAATTTG CTAAGATGAA GCGCCGGCTT GTTACATTGC TTTTCAGAGT CGATTGGTTC 6 0 

AAAATTGTCA ATTTTATCCA AAATAGAGTG CATTGTGTGT ACAATAACTA AAGAAT CATC 12 0 

CATATCTGGT CCAACACAAC ATTGATGGAA TACTGGATCA ATTGTCTAAA AAAATATCAA 18 0 

TAGAATAATG AAACATTTTC AGAATTCATT ACCGTCAATG TCAGATAGTC ATTCCTTGAG 24 0 

TATTTTGTGG ATGCTTTGAA AATTCTTCGC TGGGC CAT AT CTGTTGGATA ATCTGAAAAA 3 00 

CGCAATAAAT TTCATCGAAA ATGC CTATTA AATTGAATTA CCTTCTTCTT CATCATTTCC 36 0 

TAACAATTCA TGCTCTTTTT GTGCTTGACT TGTGACCAAT TCTTTAAATT CAATTAAATC 42 0 

GTCAATATCC TTTTGTACTA AATCCATCTT GATATTCAAT ATATCTTTGT CAGTATAGTA 48 0 

TTCAGCGTAT CTGAAATTTC GAATTTATTT TTCTAATTCC CAAGAAAAAT AATTAATAAG 54 0 

AATACCTTAA CGAATTATTA TCCAATATAT CATCATTTGC CACATCTGGA AGACGCTGAG 60 0 

GAACTGTTTG AGCAGCTTGG AGGTAGTCGT CATCGTCTCT GGAAATTGTT ATTTTCAATT 66 0 

TCAAAAAAAA AACTTTACTT ACGAAATATA CTCATTTGAT GCAATCCACG GATCAAAACG 72 0 

ACGTCTTTGC ATCTTTGAAT CATTTTCCGC ATGGCACCGC ATCACTTCTT TCTTATGATT 78 0 

ATTTTCTAAC GTTTTTGAAA ATTCGACGTG CTCTTCACAA CGGCCGCCAT GTTTCGCAAG 84 0 

TTCTTCTTTT GATCGTATCT AAAATTTTAA ATTTGAAAAA AAGCTTACTA TCAAATTTTC 90 0 

GTATTTTTTC TCACCTGCTT ACACCGAACA AGCGTTCGAT ACGAAGCATA ATTACATTGT 960 

CCATACTTAT TTTTGTCGTA TTCATTGGCA ACAAGACGGA ATCGTGTTCC AGGTGCAACT 102 0 

ATATATTGAG CAGGAGGACG AGTTGTTTGT TTCATGCTGC TTAAAAATAA AAATGGAAAA 10 8 0 

TTGAGTCAAA AAGTTGAGAT AAAACAAATT AAAACAATTT TCTGAAAAAT AAACAACTGA 114 0 

AATTTGAAGT AATAAACAAC ACGCGAAAAC GTTATTTCGG AGCATCGTTT GAGAAGTAAA 12 00 

ACTTTTTTTC GGCGCACCCT TGTGCGCAGT TTTTATCTTC TCTTTTAATT TAATTTTCAA 12 60 

GCTAAATCTT TCTTTTTAAA CTTTGAATAA ATATTTAAAT ATTCAGAATG CACCAATAAA 132 0 

CCTGGAACAA AAT CGATAAT GTTCCGCAAG CTTGGTTCTT CTGGGTCACT ATGGAAGCCG 13 80 

AAAAATCCGC ATTCTTTGGA ATACCTCAAA TATTTACAAG GAGTGCTCAC AAAAAATGAG 144 0 

AAAGTTACGG AAAACAATAA GAAAATATTA GTAGAAGCAT TACGAGCTAT CGCAGAAATT 150 0 

CTCATTTGGG GCGATCAGAA TGATGCTTCG GTTTTTGAGT GAGTTTTTTT CCAATGTTTT 15 60 

TTTTCAAATC TGATGTTGAA TTTCAGTTTC TTCCTTGAGC GGCAAATGCT TCTTTATTTC 162 0 

TTGAAAATTA TGGAACAAGG AAACAC AC C A CTAAATGTAC AATTACTGCA GACTTTGAAC 168 0 

ATTTTATTCG AAAATATTCG ACATGAAACT TCACTTTGTA AGTTTTTTAT ATGGATTTTC 174 0 

GCTTAAAATT GCCAGTTTTC AGATTTCCTT CTAAGTAACA ATCATGTAAA CTCGATTATT 18 00 

TCCCACAAAT TCGATTTACA AAATGATGAG ATCATGGCTT ACTACATTAG TTTTCTGAAA 18 60 

ACTCTTTCAT TTAAACTGAA TCCAGCTACA ATCCACTTCT TCTTCAATGA AACGACTGAA 192 0 

GAATTTCCAT TGTTGGTAGA AGTTTTGAAG CTTTATAATT GGAATGAATC AATGGTTCGA 198 0 

ATTGCTGTTA GAAATATTCT TTTAAATATT GTGAGAGTTC AAGATGATTC AATGATTATT 2 04 0 

TTCGCTATCA AGCATACAAA AGTTAGTAGA AAATTATTTT GAAAAGGTGT ATTTAAGCAA 2100 

TAAATATTAC AGGAATATCT ATCGGAGTTA ATAGATTCTC TAGTTGGTCT CTCACTTGAA 2160 

ATGGACACAT TTGTACGATC TGCTGAGAAT GTGTTAGCTA ATCGAGAGAG ATTACGAGGA 2 22 0 

AAAGTGGATG ATTTAATTGA TTTGATTCAT TATATTGGTG AACTATTGGA TGTGGAAGCT 22 80 

GTCGCCGAAA GTTTATCAAT TTTAGGTCAG TTTTACTGCT GGAAAATCAA GTTTTTAATG 2 34 0 

TTAAATTTTC AGTAACAACA CGATACTTAA GCCCTCTATT ACTTTCAAGT ATATCACCAA 24 00 

GAAGAGATAA TCATTCACTT CTACTCACTC CGATTTCTGC GTTATTTTTT TTCTCTGAAT 24 6 0 

TTTTATTGGT GAGTTTTAAC ATTTAAAATT ACATTTTTCT AATTTATTTA TTTTTCAGAT 2 52 0 

AGTTCGTCAC CATGAAACAA TATATACATT TTTATCATCT TTCCTATTTG ACACTCAGAA 2 5 80 

TACTTTGACG ACCCATTGGA TACGTCATAA TGAGAAATAT TGCTTAGAAC CG ATT AC ATT 2 64 0 

ATCATCACCA ACCGGAGAAT ATGTGAATGA AGACCAGTAA GAGCTGAAAT TTTAAAATTT 2 70 0 

TTGCTTTGAA TATAGTATTT TCAGCGTATT TTTCGATTTT CTACTGGAAG CATTTGATTC 2 760 

CAGTCAAGCA GACGATTCGA AGGCATTCTA TGGATTAATG CTGATTTATT CAATGTTTCA 2 82 0 

GAATAATGGT GAGTTTTAAA AAATTGATTT GTTAAATTAA AATTTCCATT TCCAATAACT 2 8 80 

CCTCTTCAGA CAGTAAGTTT TCAATGTTGT AAAGTTCCTG TTCATCTGTG ATCGTTTTCT 2 94 0 

TCATTTTTTT AGTTTTGCAT GAACAGTTTT CAAATTTTTT TGATATCATA CAGTAAATAT 3 00 0 

CGTCATCCAG ATAATTTTCT ATTTAAAAAA AATGAATAAA AAGAGGGCGC GCAGAAATTG 3 06 0 

CCGAAGTAAT GTAAATTTAA AGGGACACAT GCGTAGCTTG TTGTGTGGGT CTCGCCGCGC 312 0 
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TTTGTTTGAT TTATCTTGTT TTCTGCTCAA AGAGCTGTTT TTATTTTAGC GTTGAATGCT 318 0 

TTTTT AC CGT TCTCATCGGC TTTTTAATAG GAATATTTAA AAAAAAAGGT TTAATAAATC 3 24 0 

TTCGTTTTTA CAAAATCCAT CTAAGATTTG CATTTGTGAA GCTCAACAAG TAAAGTTTTA 33 00 

AGTAACATTG TTTTTTAAAA AACAATTGAA CCAAATTTTG CCGAAACATT AATAACATGA 3 3 60 

CGATACTCTA TAAAATATTC CTCTTTTCAA AATAAATTTT CAAAAAAAAT CCATTTTTCA 342 0 

GCCGATGTTG GAGAACTTCT ATCTGCTGCC AACTTCCCAG TGCTCAAAGA AT C AACGAC A 34 80 

ACTTCATTAG CTCAACAGAA TCTTGCTCGT CTC CGAATAG CATCTACGTC TTCCATATCA 354 0 

AAGCGAACGA GAGCTATCAC TGAAATTGGA GTAGAAGCGA CCGAGGAAGA TGAGATTTTT 3 60 0 

CATGATGTTC CTGAAGAACA AACGTTGGTA AGTAAATAAA TCAACATTGA TTGTTACACA 3660 

AACTTTAATA TTTTTAAATT TGAAAATTTT CTTCAAAGTG CTCAAAAATC CTGTCGAAAA 3 72 0 

TTACAGGAAG ATCTGGTGGA TGATGTATTG GTTGATACTG AAAATTCAGC AATAAGTGAT 3 780 

C CAGAAGTG A GTAGAAAACG TGCATGTATT AATTATTAAA AAAAAAATAT AGTTTTCCCC 3 840 

AGTTTTCCTT GACCTAAAAC TCAGCAATTT CAGCCTAAAA ACGTGGAGTC AGAATCTCGT 3 900 

TCTCGATTTC AATCTGCTGT TGATGAGCTT CCACCTCCGT CGACTTCTGG ATGTGATGGT 3 960 

CGACTTTTTG ATGCACTTTC ATCGATTATC AAAGCAGTTG GAACAGATGA CAATCGAATT 4 02 0 

CGACCAATTA CATTGGAACT TGCATGTCTT GTAATTC GGC AAATTTTAAT GACTGTTGAT 40 8 0 

GATGAAAAAG TAAGATTACA AATTCAAAAT TGAGCAAAAT CAGAATCTAA ATTTCATAAA 414 0 

TTGTTCAGGT ACATACCAGT TTAACGAAAT TATGCTTCGA AGTTCGTCTA AAACTTTTAT 42 0 0 

CAT C AATTGG ACAATATGTT AATGGAGAGA ATCTGTTTTT GGAGTGGTTT GAGGATGAAT 42 60 

ATGCAGAATT TGAAGTAAGC CAAGAGGTCC GAAAATAATT TAATTCATCC TTTTTATTCA 432 0 

GGTGAATCAC GTGAATTTCG ATATAATCGG TCACGAAATG CTTCTTCCTC CAGCTGCAAC 43 8 0 

TCCTCTTTCG AATCTGCTAC TTCATAAGCG ATTGCCCAGT GGATTTGAAG AACGAATAAG 444 0 

AACTGTAGGA AACTTTTTAA ATTTGAAAAT TAATTATATA TATATTTGCA GCAAATCGTA 4 5 00 

TTCTACCTAC ATATTCGAAA ATTGGAACGA GATTTGACCG GTGAAGGAGA CACAGAATTA 4560 

CCTGTGAGAG TGTTGAATTC TGATCAGGAA CCAGTTGCCA TCGGTGATTG TATTAATTTA 4 62 0 

CGTGAGTTCA TCTGCATAGA AAACACCATA TTTCTACTCA AATTAACAAT TTTCAGATAA 4680 

TTCGGATCTT CTATCCTGCA CTGTGGTTCC TCAACAACTA TGTTCTCTTG GAAAACCTGG 4 74 0 

TGATCGTCTT GCTCGATTCC TTGTCACTGA TAGACTTCAA TTAATTCTTG TCGAACCGGA 4 80 0 

TTCTCGAAAA GCCGGATGGG CAATTGTTCG ATTCGTAGGA CTTCTTCAAG ATACAACAAT 4860 

TAATGGAGAT TCTACGGATT CGAAAGTTTT GCATGTTGTG GTGGAAGGGC AACCCTCGAG 4 92 0 

AATTAAGGTA AGAATACTAA CGGGAAAAAA AAATCAAAAA ATTACTTCTG TTTCAGAAAA 4 98 0 

GACATCCGGT TTTAACTGCA AAGTTCATAT TCGATGATCA CATTCGGTGT ATGGCAG CAA 5 04 0 

AGCAACGGCT CACCAAGGTA ACGGAAAAAA TAACCAAAAA GACGGAAAGT TATTGTAAAT 5100 

GGACGAAATC GGCGAAATTA ATTGAAAACG TTTGAATTTG CCGCTAAAAC CAAACGAAAA 5160 

CCAAACGAAA GCGAAATTTA ACTATCCCTT CAGGTAGAAT AT AC ATT TT A TTTCTCTTTA 52 2 0 

TAGGGTCGCC AAACAGCACG TGGTCTGAAA CTTCAGGCGA TATGTTCAGC TCTTGGAGTT 52 8 0 

CCACGTATCG ATCCAGCGAC AATGACGTCA T C AC C AC G AA TGAATCCATT CAGAATTGTG 5 34 0 

AAAGGATGCG CACCGGGAAG TGTACGAAAA ACTGTTTCCA CAT CATC AT C GTCAAGCCAA 54 0 0 

GGACGTCCCG G AC ATT ATT C TGCAAATCTT AGATCAGCAT CTAGAAATGC AGGAATGATA 54 6 0 

CCAGATGATC CAACTCAACC GAGTAGTTCT TCGGAAAGAA GATCCTAGGG ATCAATATCT 552 0 

CTTCAGTTTC ATCATTTTAT GCTGTAAATT GTATTTAAGT ATTCCTATTC TTTGTAGTAC 55 80 

TGTATTTACA CATCGTCTAG TTAAAATCAC AAATCTCCGA AAAAACAAAC CAGTGAACAT 5 64 0 

GTGATATTTC TCTTGCCCAT AGTTCTCTTT TTTTTTTGAA ACAAAAACAA TTACTTTTAT 5 7 00 

GCTCACCTAT TCGAGCCATA TTTTTTTCCC AATTACCGGT TGTTTATTTT AATTTCTTTT 57 60 

TTTTTTCTGT AAATCTACTT TATTTTTAAA ACTGCATTTG AGATTGTGTA TATTTTTTCA 5 82 0 

AAATGGTTCA AATGCCGAAT CTATCTACTT TTTAATCATT ATTCAAACAG AAAAAC CGAT 58 8 0 

TATTTATTCA GATTCTCAAA AATGGCTGAA AAAGCTGAAA ATCTTCCATC TTCTTCGGCC 5 94 0 

GAAGCTTCAG AAGAGCCATC ACCTCAAACT GGACCAAATG TGAAT CAAAA AC CATCGATT 60 0 0 

TTGGTTCTTG GAATGGCTGG TTCTGGAAAA ACGACATTTG TTCAGGTAAC TTTCATTCAA 60 60 

TTTTGAGAGT TTTCAAACAT TACTATTTTC AGCGTCTCAC AGCATTCCTA CATGCTCGTA 612 0 

AAACACCTCC ATATGTGATT AATCTGGATC CGGCAGTTAG CAAAGTACCT TATCCAGTGA 6180 

ATGTTGACAT TCGAGATACT GTGAAATACA AGGAAGTTAT GAAAGAATTC GGAATGGGAC 624 0 

CAAATGGAGC AATTATGACA TGTCTTAACC TGATGTGTAC TCGTTTTGAT AAAGTAATTG 63 00 

AGTTGATTAA TAAGAGATCT TCTGATTTCT CAGTTTGTCT TCTTGATACT CCTGGACAAA 63 60 

TTGAAGCATT CACTTGGAGT GCTAGTGGAT CTATTATCAC TGATTCATTG GCAAGTAGCC 642 0 

ATCCCACGGT AAGGGATTTT GATTTATGAA ATCTGCTTGA AATGAAAAAA GATTCTAATA 64 8 0 
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AATTTTTGAC TTTTAAACAT TTTTTACAGT TATATTTGGT CTATTTTCTA TCATTAAAAG 654 0 

CAAAATGAAA AGTCGATTCT ACTCCATATT TATTAATTTC GACTTTTCAG GTGGTAATGT 6600 

ACATTGTGGA TTCCGCTCGT GCCACAAATC CAACTACATT CATGTCCAAT ATGCTCTACG 666 0 

CATGTTCCAT TCTCTACCGT ACCAAACTTC CATTCATTGT CGTTTTCAAC AAAGCTGATA 672 0 

TTGTCAAACC AACATTTGCA CTCAAATGGA TGCAAGATTT CGAAAGATTT GATGAAGCTT 67 8 0 

TAGAGGATGC CAGAAGCAGT TATATGAATG ATTTGAGTCG TTCATTGAGT CTCGTTCTTG 684 0 

ATGAATTCTA TTGCGGACTG AAAACAGGTT TTTATTCGAA ATAAAACCTT TTTTAAATAA 6900 

TAAATTTCAG TTTGCGTCAG TTCTGCAACT GGAGAAGGAT TCGAAGATGT AATGACAGCA 6960 

ATCGATGAAA GTGTTGAAGC ATACAAAAAA GAATATGTTC CAATGTATGA AAAAGTGTTG 702 0 

GCTGAGAAAA AACTATTGGA TGAGGAGGAG AGAAAGAAAA GAGATGAAGA GGTAATTGTA 70 8 0 

GTAATTTAAT TCTGATTATC TTCAAATTTT CAGACTCTGA AAGGAAAAGC TGTTCACGAC 714 0 

CTGAACAAAG TCGCCAATCC CGACGAATTT CTGGAGTCGG AGTTGAATTC AAAAATCGAT 72 0 0 

AGAATTCATT TGGGCGGAGT CGATGAAGAG AATGAGGAGG ATGCTGAACT CGAAAGATCC 72 6 0 

TGATTTTCTT TTTGTTTTTG AATTTTTATT CTATTTTGAT CCCTGTTTAC TTCTTATTGT 732 0 

TCTCATTTTG TTGCGTTGTT TTACATTTTA CTCATTTTTG CATAAACTTG TTGCAAAAAT 7 3 80 

CAATATAATT TTTGATCTGG AAATGGTTTT AAACCTTAAC CTTTCATATA TTAATAATTT 744 0 

TTTTTCAAAA AAACGTTCTA AAAAGGTTCC TCATTTTTTC AATATAGGAA ATTTTGAAGA 75 00 

TCTTTTCCAA AAATGAGGTT CTTCGCTTGA AAAGCCAACA TTTAAAACCT XTTXTXTXCC 756Q 

AGAAACCTAG TGGTTAATGT CTGAAAAGAC GTTCCACAAG GCACAGACCA TCCGTGCAAA 762 0 

GGCATCCGGA GTGCCTTCAA TCGTCGAAGC TGTACAGTTT CATGGAGTTC GCATCACAAA 7680 

AAACGATGCT TTGGTTAAGG AGGTACTACC CAAATTTCAA AATGTTGCAC AATTCAATTG 774 0 

AAAATATAAA TTGTGAATTA AATTCAACTT ACATGTTTTT TCAGGTTTCC GAATTATACA 7 8 00 

GAAGTAAAAA TCTAGATGAA CTTGTTCATA ACTCTCATCT GGCGGCTCGT CATCTTCAAG 78 60 

AAGTTGGATT AATGGATAAT GCAGTTGCTC TAATTGATAC ATCTCCAAGC TCAAATGAAG 792 0 

GATATGTTGT CAATTTCCTA GTTCGAGAAC CAAAATCATT CACTGCTGGA GTCAAAGCAG 7 98 0 

GAGTTTCAAC GAATGGAGAT GCGGATGTCA GTTTAAATGC CGGAAAACAA AGTGTTGGAG 804 0 

GACGAGGAGA GGCAATCAAT ACACAGTATA CATATACTGT AAAGGTAAGG ACGAGAGTTG 810 0 

GCACTGCCAG TTTGGCATGT TCTCCCAATA TTTTTTAATT ATAAAATTTG GAAGTATAAA 8160 

AAAATGTTTG CTTCATCTAA AAATAGCCTT TTTCACATGA AAAAAATTGA AAAAAAGTGC 82 2 0 

TCAAAAATTT CAGAAATTTC CAATTTCCAA ACAATTTTGG AGAACTTTCA AAAATTTTTC 82 8 0 

CAACTGAAAT TAAAGCTATA TTCTATCACT AAATTTTATA CAAGTCTTAA GAGAAAATGA 834 0 

TGAAGTGGCT GATTTTGTAG AATTTCCTAA AAAATAATAT CTTCAGGGCG ATCACTGCTT 84 0 0 

CAACATTTCC GCAATCAAAC CATTCCTGGG ATGGCAAAAA TATTCGAATG TATCAGCGAC 846 0 

TCTATACCGT TCACTTGCAC ATATGCCATG GAATCAATCA GATGTTGATG AGAATGCAGC 852 0 

TGTTCTTGCA TATAATGGAC AACTATGGAA TCAAAAGCTT TTGCATCAAG TCAAATTGAA 85 8 0 

TGCGGTAAAG TATTATAAGT GTTTTGTCCA AACTATGATA CAGTTCTTCA GATATGGAGA 864 0 

ACACTTCGTG CCACTCGAGA TGCCGCATTT TCAGTTCGTG AACAAGCCGG ACACACTTTG 87 0 0 

AAATTCTCGT TGGAGAATGC TGTAGCTGTT GATACAAGAG ATAGACCTAT TCTTGCAAGT 8760 

CGTGGAATTC TTGGTAAGAG TAACAACGAC TATTTTTAAA AAATATCTTT TTCGAAAAAA 8 82 0 

TTACGAACGA AAAAAAACTG TATTATGTAC CCAAACGCGA AATTTTGCAG TTCTTGCGCG 88 80 

TTCTTGTTGA TAAAAAATAT GTAAAAAATT GGAAAAACTA CGAAAAGTCG ATAAAAATTC 8 94 0 

CGTACCAACC GGAAAATGTT TCATTAATTT CTCTTCCTTT TTTCAGCTCG TTTTGCTCAA 9000 

GAGTACGCAG GAGTATTTGG TGATGCGTCA TTTGTGAAGA ATACATTAGA TTTACAGGTA 90 60 

ACAACCTTAT TTCAACAATT ATTTCAAATT CTATTAAAAA TAATTCCAGG CAGCTGCCCC 912 0 

TCTTCCACTC GGTTTCATTC TTGCCGCCTC ATTCCAAGCG AAACATTTGA AAGGACTCGG 9180 

AG AT C G AG AA GTTCATATTT TGGATAGATG TTATTTGGGT GGACAACAGG ATGTTCGAGG 924 0 

ATTTGGTCTG AATACTATTG GAGTGAGTTT TAACGAAATT CTCTTGAAAG TCAAATAATC 93 0 0 

ATTTTCAGGT TAAAGCAGAT AACAGTTGTC TTGGAGGAGG TGCTTCACTT GCTGGTGTCG 93 60 

TTCATTTGTA TCGGCCATTG ATTCCACCAA ATATGCTATT TGCACACGCA TTCCTTGCAT 94 2 0 

CTGGAAGTGT TGCATCAGTT CAT T C C AAAA ATTTGGTGCA ACAATTACAG GATACTCAAC 94 8 0 

GAGTATCAGC CGGATTTGGT GAGTTTGAAA TTTAGGAAAG ATTTGGATGA AATGTATTTT 954 0 

TTAAAAATAG ATCAGCTTTA TTTATTTGAA AAAAAACGCT CATTAATCAA TAGTGATAGT 9600 

TCCATTCTGA GTTTCTTCTT CTTCCTCGCG GAATACAATT TTTGACTTGT TCGCATCCTT 9 660 

CTTGTGTACT TTGTCACCAA TCTTCTCATC AACTAAATCT CGAAACTGAA AAAATTTCAA 972 0 

AATTATTCCA AAAAATATTG ATGCAGACTA CCTTTTTGAT GGCTTCTGGT ACGTTTCTAG 97 80 

CGTCGAATGG ATTGGCTCCT CCAATAATTA AAGTCTCGTT CGGTAGTTTA GCCAGACGGA 98 4 0 
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CGGTGTGCTT CAACATTTTT CTAATTAATC TATTTCAATT CAAGTCACTC ACTCTCTCTT 9900 
GACGTCTTCT TCTATATTCC AAGAACTCTG CAGAAAATCC GTGTCCGCCT TGTGTGTTTC 9960 

TAGTTGGCGT CGGAGGATTC ACGGGTCCAA GACGAATGGA TGTCTAAAAA ATGTTATATT 1002 0 

TTTGCATAAA GAAAACACCA TACCTTCACC ACTTTTTGAG TTGTGGGCGT TCTGAATGGA 1008 0 

ATTGATCGAT TATTATTGCT CTTTCTTGAT TTGCTTCTAT CAGCTGCGTA ATGAGGTGTT 1014 0 

CTAAAGATCA GCTTTAATTC ATTTGGACAA GTGCTCCTCT AATAAACTTA CCCTGTACTC 10200 

ATTTTTGAAA CGATTTACGA TGATAAGATT GAAAGTGGAA GTTAAATTTA GTCTTTCAAA 10260 

GTTGAAATAA AATCTTCATA AATAAATAAA TTTAAATGAA AGATTAAATA AATTAACGTT 10320 

CACGTAGTTA AAAAAATAAT TTAAATCTTA ACTTCTAATA AAAAATCTCA ATTTTCCAGG 103 80 

ACTCGCATTC GTGTTCAAAA GTATTTTCCG GCTGGAACTC AACTACACGT AT CCATTGAA 104 4 0 

ATATGTGCTC GGCGATTCAT TGCTCGGTGG ATTCCATATT GGAGCTGGTG TCAACTTCTT 105 00 

GTAGAGATTA ATTGGATGCA AGCACCCCTC AAAAAGATTT TTTTGAAAAA CGATAAATTC 105 60 

ACAGAATTTC AGTTCTTTTT CTCCCCCTTT TATTGTTATT TTCATCGTAA TGCTGTGCTA 10 620 

GAAGTCAGAG TAAATATGAG TTTTTTTGTG TTCTAGGAAT TCCATTTTTT CAGGAAGCAA 10680 

ATTTAATAAA AATTATCGAA TTTCTTGCTC TAAAGATGTT GTACATTTTA TGGAAATGTT 10740 

CGTATAGTAA TTCGAACACT TTATATTTCT CGTTTTAAAA CTGTCGGTGT TTTATAGTAA 10 8 00 

ACTATCTTCA GAAAAAAATG AGCCTACGAA AAATCAATTT CGTAACTGGA AACGTGAAGA 10 8 60 

AGCTTGAAGA AGTCAAGGCT ATTTTGAAGA ATTTCGAGGT AAAATATATT TGATATTATT 10 920 

CGAACGCGAA ATTTTGCGCC AAAAGTACGA TGCCTGGTCT CAACACGACA ATATTTTGTT 10980 

AAATACAAAC GAATGTGCGC CTTCAAAGAA AAGTTTCAAT CTTTCGTTGC CGTGGAGATA 1104 0 

TTTTTAGAGT TTTTGTTTAA ATTATATATT TGTCGTATCG AAACCGGGTA CCGTAATCAA 11100 

TCAATTAAAT ATTTTCAGGT TTCAAAGGTG GATGTCGATT TGGATGAATT CCAAGGAGAA 1116 0 

CCCGAATTTA TTGC CGAAAG AAAGTGCCGT GAGGCTGTTG AAGCTGTAAA AGGGCCCGTT 112 2 0 

TTGGTATGGA AAATTGTATT TGTTCTAAAA ATTGTCAAAT TTCAGGTCGA AGACACAAGT 112 80 

TTATGCTTCA ACGCAATGGG CGGTCTTCCT GGACCTTATA TCAAGTGGTT TTTGAAGAAT 11340 

TTGAAAC CAG AAGGACTACA TAATATGCTA GGTAAATATT TTAATTTTTT GAAAAAACTT 114 0 0 

ATTTTTCAGC CGGATTTTCT GACAAAACCG CCTATGCTCA ATGCATCTTT GCGTACACTG 114 60 

AAGG ACT CGG AAAAC CTATT CATGTATTTG CTGGTATGAT TTTTTGAATT TAATTCTTTA 1152 0 

ATTTTATATG TTAATTTAGT TGTTTCATTC CTCAATTTAT GAGAGATTTT TTTTTCAATT 115 8 0 

TTTCTATTTC AGGAAAATGT CCTGGTCAAA TTGTTGCTCC ACGTGGTGAT ACTGCTTTTG 11640 

GATGGGATCC ATGCTTCCAG CCAGATGGTT TTAAAGAAAC ATTCGGAGAA ATGGATAAAG 117 00 

ATGTAAAAAA TGAAATTTCT CATCGTGCAA AGGCTCTGGA ACTCCTCAAG GAATATTTTC 117 60 

AGAATAATTA AATTATTTTT TCTCATCTAT GCAATTTCTT GAAAATTTGT TAAGTTTCCG 1182 0 

TTGTTATGCA TTTGCTTTTA TTTAAAAAAA AAAGAATATT TTTACATTAA TATTAGATAT 118 8 0 

GAGAAAAGAG TAATTTCTGG ATTTTAACCT TCCTACAAAA GAATATTTAT ATTTTTTGTA 11940 

TGATTTTTTA AAAATATCGT CAGGAAATAA TAACATTTCA GATATACCCT GAACTCTACA 12 0 00 

GTTTATGATA TTCAGGAAAT TTCTGAATTT TCTGAAACCT TACAAAATGC GAACGGATCC 12 060 

GATTATTTTC GTGATTGGGT GCACTGGAAC CGGGAAAAGT GATCTTGGAG TGGCAATTGC 1212 0 

AAAGAAATAT GGAGGAGAGG TGATTAGTGT AGATTCAATG CAATTTTATA AAGGTACATG 12180 

GGTTTTGTTT CAATTTTAAA TTAATTAATT TTCGTTTTTC AGGACTTGAC ATTGCCACGA 12 2 40 

ATAAGATAAC GGAAGAAGAA TCTGAAGGGA TTCAACATCA TATGATGTCA TTTTTGAATC 12 3 00 

CATCTGAATC ATCATCTTAT AATGTACATA GTTTCCGAGA AGTCACGTTG GATCTTATTA 12 3 60 

AAGTGCTTAA TTCGCCACTT TTTGAACTTG ATCCTAATTT TCATAATTTT CAGAAAATCC 1242 0 

GCGCCCGTTC AAAAATTCCT GTAATTGTCG GAGGAACCAC TTATTATGCT GAAAGTGTCC 124 8 0 

TTTATGAGAA TAATCTGATT GAAACCAACA CTTCAGATGA CGTGGATTCC AAATCGAGAA 12 540 

CATCATCAGA ATCGTCATCT GAAGACACTG AAGAAGGAAT TAGTAATCAA GAATTATGGG 12 600 

ATGAATTGAA AAAAATCGAC GAAAAAT CAG CACTTCTTCT ACATCCAAAT AATCGTTATC 12 660 

GAGTACAGAG AGCATTGCAA ATTTTCAGAG AAACTGGTAA TTGATTTGCA AATTTCCAGA 12 72 0 

TTAAAAACAA ATCAAGTAAA GTTTTTTGCA GGAATCCGAA AAAGTGAACT TGTTGAAAAA 12 7 80 

CAGAAATCAG ATGAAACTGT TGATTTGGGT GGACGACTAC GATTTGATAA TTCTTTAGTT 12 84 0 

ATTTTTATGG ATGCAACACC TGAAGTTTTA GAAGAAAGAC TTGATGGAAG AGTTGATAAA 12 900 

ATGATTAAAT TGGGTTTGAA GAATGAATTG ATCGAGTTTT ATAACGAGGT AAATATTTGA 12 9 60 

ATTTTTCCAG AAAAAAAAAG AAAATTTTTT ATTATTTTGT TTTTTTTTCA TTCTTTACTA 13 02 0 

TTTTCCAAAA AAGTTTAAAC TTTTGAAAAC TGTT C AGAAA ATGTTCGTGT ATTTATTTTA 13 0 80 

GCTTACTGAG GCATTATTTC ATTGTGATTT TTACTATACT CTATAAACTA AATTTTCAGC 1314 0 

ACGCCGAGTA CATAAATCAC AGCAAATATG GTGTCATGCA ATGTATTGGT CTTAAAGAAT 132 0 0 
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TCGTTCCATG GCTCAATTTG GACCCATCAG AAAGAGATAC ACTCAATGGG GATAAATTGT 132 60 

TCAAGCAAGG GTAATTTAAA TTTATTTTCA ATTTTTATAA ATTCCAAGCT ATTTTCAGAT 13 32 0 

GCGATGATGT GAAGCTTCAC ACTCGACAAT ATGCACGGCG C C AG AGACGG TGGTATCGAT 133 8 0 

CGAGACTTTT AAAACGGTCG GATGGTGATC GGGTATGTTG ATTTTAAAAA AATTGAATTT 1344 0 

TTAAAGAACT TTTTTACTAA ATTAACAAAG TTATTGGCTG AAAATGGCTG AAAATTATAG 13 50 0 

TAAAACTAAT CAAAAAAATT GAAATTTTGA ATTAAAGTCA TAAAGTGACG AC C AG AAAAT 13 560 

TAAAAAAAAA CATTTTTCTA TTTTAATTAA TTCACTCTAC TTCACTTTAA AAATAATTTT 1362 0 

CAGAAAATGG CAAGTACAAA AATGCTGGAT ACATCTGACA AGTACCGAAT AATTAGTGAT 13 680 

GGAATGGACA TTGTTGATCA ATGGATGAAT GGAATCGATC TATTTGAAGA TGTAAAATTT 1374 0 

CACAAATTCT AAAATTTCCG AATCACAAAT TAAAATTTCT ACAGATCTCC ACAGACACCA 13 8 00 

ATCCAATTCT AAAAGGGTCC GATGCAAATA TTCTGCTGAA TTGTGAAATC TGTAATATTT 13 8 60 

CAATGACTGG AAAAGATAAT TGGTTTGTTT CAATACATAT TATAATTTCG AAATGAATTT 13 92 0 

TTTCAGGCAG AAACAT AT CG ATGGGAAAAA GCACAAGCAT CATGCTAAGC AAAAGAAATT 13 98 0 

GGCAGAGACT CGCACATAAG ACGCTATATT TATTTTTTGT TAACTTAAAT TATTTTTGTT 14 04 0 

GTTGATTGTT CTCTAAATAA AAAAACAGCT CAGAGAGAAG ATTAGGCGCT CGTCCACATC 1410 0 

TCCGACGATA GTCAACCCGA ACGAAGGGAA CTATCTTTAA TTGT C AGTGA TGACGTCATG 14160 

TCGTCAAGAA CTCGTCATAG CTGTGAGAAT TGAACCATTA TAGATTTGGA CATTAGTTTA 142 2 0 

GGTTATATCC AGTACACTAA ATGGTACATG ATAGACAGTG TACATTTACA GATTTATAGA 142 8 0 

TTGTCTCAGT GACTAGTTAC CGGAAGAGGA GAGGAGAACA TGTGGCGATG TCTTTTGGAT 14 34 0 

CGATATTATT CCGTCTGAAA ATTGTTCACT AGGGGGACTG C CGATTAC C A CTTCACATGA 144 0 0 

CGGAACATGT TAGTTAAAAT ATTGGCTTTT ATACACATTT TCAAAATAGC ACCTGTAT 144 5 8 



(2) INFORMATION" FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



Met lie Phe Arg 
1 

Thr Asp Pro lie 
20 

Asp Leu Gly Val 
35 

Val Asp Ser Met 
50 

He Thr Glu Glu 
65 

Leu Asn Pro Ser 

Val Thr Leu Asp 
100 

Val He Val Gly 
115 

Asn Asn Leu He 
130 

Arg Thr Ser Ser 
145 

Asn Gin Glu Leu 



Lys Phe Leu Asn 
5 

He Phe Val He 

Ala He Ala Lys 
40 

Gin Phe Tyr Lys 
55 

Glu Ser Glu Gly 
70 

Glu Ser Ser Ser 
85 

Leu He Lys Lys 

Gly Thr Thr Tyr 
120 

Glu Thr Asn Thr 
135 

Glu Ser Ser Ser 
150 

Trp Asp Glu Leu 



Phe 


Leu 


Lys 


Pro 




10 






Gly 


Cys 


Thr 


Gly 


25 








Lys 


Tyr 


Gly 


Gly 


Gly 


Leu 


Asp 


He 








60 


He 


Gin 


His 


His 






75 




Tyr 


Asn 


Val 


His 




90 






He 


Arg 


Ala 


Arg 


105 








Tyr 


Ala 


Glu 


Ser 


Ser 


Asp 


Asp 


Val 








140 


Glu 


Asp 


Thr 


Glu 






155 




Lys 


Lys 


He 


Asp 



Tyr Lys Met Arg 
15 

Thr Gly Lys Ser 
30 

Glu Val He Ser 
45 

Ala Thr Asn Lys 

Met Met Ser Phe 
80 

Ser Phe Arg Glu 
95 

Ser Lys He Pro 
110 

Val Leu Tyr Glu 
125 

Asp Ser Lys Ser 

Glu Gly He Ser 
160 

Glu Lys Ser Ala 
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165 










170 










175 




Leu 


Leu 


Leu 


His 


Pro 


Asn 


Asn 


Arg 


Tyr 


Arg 


Val 


Gin 


Arg 


Ala 


Leu 


Gin 








180 










185 










190 






He 


Phe 


Arg 


Glu 


Thr 


Gly 


He 


Arg 


Lys 


Ser 


Glu 


Leu 


Val 


Glu 


Lys 


Gin 






195 










200 










205 








Lys 


Ser 


Asp 


Glu 


Thr 


Val 


Asp 


Leu 


Gly 


Gly Arg 


Leu 


Arg 


Phe 


Asp 


Asn 




210 










215 










220 










Ser 


Leu 


Val 


He 


Phe 


Met 


Asp 


Ala 


Thr 


Pro 


Glu 


Val 


Leu 


Glu 


Glu 


Arg 


225 










230 










235 










240 


Leu 


Asp 


Gly Arg 


Val 


Asp 


Lys 


Met 


He 


Lys 


Leu 


Gly 


Leu 


Lys 


Asn 


Glu 










245 










250 










255 




Leu 


He 


Glu 


Phe 


Tyr 


Asn 


Glu 


His 


Ala 


Glu 


Tyr 


He 


Asn 


His 


Ser 


Lys 








260 










265 










270 






Tyr 


Gly Val 


Met 


Gin 


Cys 


He 


Gly 


Leu 


Lys 


Glu 


Phe 


Val 


Pro 


Trp 


Leu 






275 










280 










285 








Asn 


Leu 


Asp 


Pro 


Ser 


Glu 


Arg 


Asp 


Thr 


Leu 


Asn 


Gly 


Asp 


Lys 


Leu 


Phe 




290 










295 










300 










Lys 


Gin 


Gly 


Cys 


Asp 


Asp 


Val 


Lys 


Leu 


His 


Thr 


Arg 


Gin 


Tyr 


Ala 


Arg 


305 










310 










315 










320 


Arg 


Gin 


Arg 


Arg 


Trp 


Tyr 


Arg 


Ser 


Arg 


Leu 


Leu 


Lys 


Arcr 


Ser 


ASD 


Glv 










325 










330 










335 




Asp 


Arg 


Lys 


Met 


Ala 


Ser 


Thr 


Lys 


Met 


Leu 


Asp 


Thr 


Ser 


Asp 


Lys 


Tyr 








340 










345 










350 






Arg 


He 


He 


Ser 


Asp 


Gly 


Met 


Asp 


He 


Val 


Asp 


Gin 


Trp 


Met 


Asn 


Gly 






355 










360 










365 








He 


Asp 


Leu 


Phe 


Glu 


Asp 


He 


Ser 


Thr 


Asp 


Thr 


Asn 


Pro 


He 


Leu 


Lys 




370 










375 










380 










Gly 


Ser 


Asp 


Ala 


Asn 


He 


Leu 


Leu 


Asn 


Cys 


Glu 


He 


Cys 


Asn 


He 


Ser 


385 










390 










395 










400 


Met 


Thr 


Gly 


Lys 


Asp 


Asn 


Trp 


Gin 


Lys 


His 


He 


Asp 


Gly 


Lys 


Lys 


His 










405 










410 










415 




Lys 


His 


His 


Ala 


Lys 


Gin 


Lys 


Lys 


Leu 


Ala 


Glu 


Thr 


Arg 


Thr 












420 










425 










430 







(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 041 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: Genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CTGCCATAAG ATGGCGTCCG TGGCGGCTGC ACGAGCAGTT CCTGTGGGCA GTGGGCTCAG 6 0 

GGGCCTGCAA CGGACCCTAC CTCTTGTAGT GATTCTCGGG GCCACGGGCA CCGGCAAATC 12 0 

CACGCTGGCG TTGCAGCTAG GCCAGCGGCT CGGCGGTGAG ATCGTCAGCG CTGACTCCAT 18 0 

GCAGGTCTAT GAAGGCCTAG ACATCATCAC CAACAAGGTT TCTGCCCAAG AGCAGAGAAT 24 0 

CTGCCGGCAC CACATGATCA GCTTTGTGGA TCCTCTTGTG ACCAATTACA CAGTGGTGGA 3 00 

CTTCAGAAAT AGAGCAACTG CTCTGATTGA AGATATATTT GCCCGAGACA AAATTCCTAT 3 60 

TGTTGTGGGA GGAACCAATT ATTACATTGA ATCTCTGCTC TGGAAAGTTC TTGTCAATAC 42 0 

CAAGCCCCAG GAGATGGGCA CTGAGAAAGT GATTGACCGA AAAGTGGAGC TTGAAAAGGA 4 80 

GGATGGTCTT GTACTTCACA AACGCCTAAG CCAGGTGGAC CCAGAAATGG CTGCCAAGCT 54 0 
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GCATCCACAT GACAAACGCA AAGTGGCCAG GAGCTTGCAA GTTTTTGAAG AAACAGGAAT 60 0 

CTCTCATAGT GAATTTCTCC ATCGTCAACA TACGGAAGAA GGTGGTGGTC CCCTTGGAGG 660 

TCCTCTGAAG TTCTCTAACC CTTGCATCCT TTGGCTTCAT GCTGACCAGG CAGTTCTAGA 72 0 

TGAGCGCTTG GATAAGAGGG TGGATGACAT GCTTGCTGCT GGGCTCTTGG AGGAACTAAG 7 80 

AG ATTTT C AC AGACGCTATA ATCAGAAGAA TGTTTCGGAA AATAGC CAGG ACTATCAACA 84 0 

TGGTATCTTC CAAT CAATTG GCTTCAAGGA ATTTCACGAG TACCTGATCA CTGAGGGAAA 900 

ATGCACACTG GAGACTAGTA ACCAGCTTCT AAAGAAAGGA CCTGGTCCCA TTGTCCCCCC 960 

TGTCTATGGC TTAGAGGTAT CTGATGTCTC GAAGTGGGAG GAGTCTGTTC TTGAACCTGC 102 0 

TCTTGAAATC GTGCAAAGTT TCATCCAGGG CCACAAGCCT ACAGCCACTC CAATAAAGAT 1080 

GCCATACAAT GAAGCTGAGA ACAAGAGAAG TTATCACCTG TGTGAC CTCT GTGATCGAAT 114 0 

CATCATTGGG GATCGCGAAT GGGCAGCGCA CATAAAATCC AAATCCCACT TGAACCAACT 12 00 

GAAGAAAAGA AGAAGATTGG ACTCAGATGC TGTCAACACC ATAGAAAGTC AGAGTGTTTC 12 60 

CCCAGACTAT AACAAAGAAC CTAAAGGGAA GGGATCCCCA GGGCAGAATG ATCAAGAGCT 132 0 

GAAATGCAGC GTTTAAGAGA CATGTCCAGT GGCCTTTGGA AAGGTGGTGG GGATCCAGTT 13 8 0 

CAGGAGGGAG GGGTATGTTT GTCTCCCAGT CTGGGCAAAG GAGTGCTATG CGGAATTCTC 14 4 0 

TGCATAGCAG AAAAGCTCCC ACCATTTTCT TTTGATGTGG TTTTAAAGTC TCACGTTCTC 15 0 0 

TATAATAGAA ACAGCAGGTC TTGTCAGCTC CTTGTGTGGC TGATGTGTCT GGAAATGATG 15 6 0 

TAGTTCAGGA AAGCATTTTT TTTTTCTTTG AACCTTAAAG GTTCTATTAT TAAAAGCAGC 162 0 

ACAGATTCCA CATTTTTATA CATGAGGATC TTCTTTGTGG TGAAT AC CAG GATTGACTGC 1680 

ATCCCTTTAA AAGAAGTTTT ATGTCCCTGA CTCTGGCTAA AATTATCTAA TTTCCAGATG 174 0 

CTTTTGTAGA TGACTGAAGT ATTTGTGAGC CACATATTGG GAGTTCTAGA TTTGAGTGAA 18 00 

TGGCAGGAAA GGGCCATCTC CATTGAGATG ATTAAGTGAA CCAAACTAGT TCTCGGAATT 18 60 

CTACAGAGAA GGAGGGAATC AGACTGAGGA AGCTGTGACA TAGGACTTGA AG AC CAAAGA 192 0 

CTTTGAAATT TGCGAGCTGC TCATGTGTGA GTTATTATCA CTGCTGTCTT TCTATTGAGT 198 0 

TACAAATCTA TATTTTTATT GAAGTTTAAA TAAAGAAAAA ATTTACAAGA AAAAAAAAAA 2 04 0 

A 2041 



(2) INFORMATION FOR SEQ ID NO ; 4 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 92 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 



Met 


Phe 


Arg 


Lys 


Leu 


Gly 


Ser 


Ser 


Gly 


Ser 


Leu 


Trp 


Lys 


Pro 


Lys 


Asn 


1 








5 










10 










15 




Pro 


His 


Ser 


Leu 


Glu 


Tyr 


Leu 


Lys 


Tyr 


Leu 


Gin 


Gly 


Val 


Leu 


Thr 


Lys 








20 










25 










30 






Asn 


Glu 


Lys 


Val 


Thr 


Glu 


Asn 


Asn 


Lys 


Lys 


He 


Leu 


Val 


Glu 


Ala 


Leu 






35 










40 










45 








Arg 


Ala 


He 


Ala 


Glu 


He 


Leu 


He 


Trp 


Gly 


Asp 


Gin 


Asn 


Asp 


Ala 


Ser 




50 










55 










60 








Val 


Phe 


Asp 


Phe 


Phe 


Leu 


Glu 


Arg 


Gin 


Met 


Leu 


Leu 


Tyr 


Phe 


Leu 


Lys 


65 










70 










75 










80 


He 


Met 


Glu 


Gin 


Gly 


Asn 


Thr 


Pro 


Leu 


Asn 


Val 


Gin 


Leu 


Leu 


Gin 


Thr 










85 










90 










95 




Leu 


Asn 


He 


Leu 


Phe 


Glu 


Asn 


He 


Arg 


His 


Glu 


Thr 


Ser 


Leu 


Tyr 


Phe 








100 










105 










110 




Leu 


Leu 


Ser 


Asn 


Asn 


His 


Val 


Asn 


Ser 


He 


He 


Ser 


His 


Lys 


Phe 


Asp 






115 










120 










125 





Leu 


Gin 


Asn 


Asp 




130 






Leu 


Ser 


Phe 


Lys 


145 








Thr 


Thr 


Glu 


Glu 


Trp 


Asn 


Glu 


Ser 








180 


lie 


Val 


Arg 


Val 






195 




Thr 


Lys 


Glu 


Tyr 




210 






Leu 


Glu 


Met 


Asp 


225 








Arg 


Glu 


Arg 


Leu 


Tyr 


He 


Gly 


Glu 








260 


lie 


Leu 


Val 


Thr 






275 




Ser 


Pro 


Arg 


Arg 




290 






Leu 


Phe 


Phe 


Phe 


305 








He 


Tyr 


Thr 


Phe 


Thr 


Thr 


His 


Trp 








340 


Thr 


Leu 


Ser 


Ser 






355 




Phe 


Asp 


Phe 


Leu 




370 






Lys 


Ala 


Phe 


Tyr 


385 








Ala 


Asp 


Val 


Gly 


Glu 


Ser 


Thr 


Thr 








420 


He 


Ala 


Ser 


Thr 






435 




He 


Gly Val 


Glu 




450 






Glu 


Glu 


Gin 


Thr 



465 

Glu Asn Ser Ala 

Ser Arg Ser Arg 
500 

Thr Ser Gly Cys 
515 

Lys Ala Val Gly 
530 

Leu Ala Cys Leu 
545 

Lys Val His Thr 



Glu lie Met Ala 
135 

Leu Asn Pro Ala 
150 

Phe Pro Leu Leu 
165 

Met Val Arg He 

Gin Asp Asp Ser 
200 

Leu Ser Glu Leu 
215 

Thr Phe Val Arg 
230 

Arg Gly Lys Val 
245 

Leu Leu Asp Val 

Thr Arg Tyr Leu 
280 

Asp Asn His Ser 
295 

Ser Glu Phe Leu 
310 

Leu Ser Ser Phe 
325 

He Arg His Asn 

Pro Thr Gly Glu 
360 

Leu Glu Ala Phe 
375 

Gly Leu Met Leu 
390 

Glu Leu Leu Ser 
405 

Thr Ser Leu Ala 

Ser Ser He Ser 
440 

Ala Thr Glu Glu 
455 

Leu Glu Asp Leu 
470 

He Ser Asp Pro 
485 

Phe Gin Ser Ala 

Asp Gly Arg Leu 
520 

Thr Asp Asp Asn 
535 

Val He Arg Gin 
550 

Ser Leu Thr Lys 
565 
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Tyr Tyr He Ser 
140 

Thr He His Phe 
155 

Val Glu Val Leu 
170 

Ala Val Arg Asn 
185 

Met He He Phe 

He Asp Ser Leu 
220 

Ser Ala Glu Asn 
235 

Asp Asp Leu He 
250 

Glu Ala Val Ala 
265 

Ser Pro Leu Leu 

Leu Leu Leu Thr 
300 

Leu He Val Arg 
315 

Leu Phe Asp Thr 
330 

Glu Lys Tyr Cys 
345 

Tyr Val Asn Glu 

Asp Ser Ser Gin 
380 

He Tyr Ser Met 
395 

Ala Ala Asn Phe 
410 

Gin Gin Asn Leu 
425 

Lys Arg Thr Arg 

Asp Glu He Phe 
460 

Val Asp Asp Val 
475 

Glu Pro Lys Asn 
490 

Val Asp Glu Leu 
505 

Phe Asp Ala Leu 

Arg He Arg Pro 
540 

He Leu Met Thr 
555 

Leu Cys Phe Glu 
570 



Phe Leu Lys Thr 

Phe Phe Asn Glu 
160 

Lys Leu Tyr Asn 
175 

He Leu Leu Asn 
190 

Ala He Lys His 
205 

Val Gly Leu Ser 

Val Leu Ala Asn 
240 

Asp Leu He His 
255 

Glu Ser Leu Ser 
270 

Leu Ser Ser He 
285 

Pro He Ser Ala 

His His Glu Thr 
320 

Gin Asn Thr Leu 
335 

Leu Glu Pro He 
350 

Asp His Val Phe 
365 

Ala Asp Asp Ser 

Phe Gin Asn Asn 
400 

Pro Val Leu Lys 
415 

Ala Arg Leu Arg 
430 

Ala He Thr Glu 
445 

His Asp Val Pro 

Leu Val Asp Thr 
480 

Val Glu Ser Glu 
495 

Pro Pro Pro Ser 
510 

Ser Ser He He 
525 

He Thr Leu Glu 

Val Asp Asp Glu 
560 

Val Arg Leu Lys 
575 
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Leu Leu Ser Ser 
580 

Glu Trp Phe Glu 
595 

Phe Asp lie lie 
610 

Leu Ser Asn Leu 
625 

Arg lie Arg Thr 

Arg Asp Leu Thr 
660 

Asn Ser Asp Gin 
675 

Asn Ser Asp Leu 
690 

Leu Gly Lys Pro 
705 

Leu Gin Leu lie 

He Val Arg Phe 
740 

Ser Thr Asp Ser 
755 

Arg He Lys Lys 
770 

Asp His He Arg 
785 

Gin Thr Ala Arg 

Val Pro Arg He 
820 

Pro Phe Arg He 
835 

Val Ser Thr Ser 
850 

Ala Asn Leu Arg 
865 

Pro Thr Gin Pro 



He Gly Gin Tyr 

Asp Glu Tyr Ala 
600 

Gly His Glu Met 
615 

Leu Leu His Lys 
630 

Gin He Val Phe 
645 

Gly Glu Gly Asp 

Glu Pro Val Ala 
680 

Leu Ser Cys Thr 
695 

Gly Asp Arg Leu 
710 

Leu Val Glu Pro 
725 

Val Gly Leu Leu 

Lys Val Leu His 
760 

Arg His Pro Val 
775 

Cys Met Ala Ala 
790 

Gly Leu Lys Leu 
805 

Asp Pro Ala Thr 

Val Lys Gly Cys 
840 

Ser Ser Ser Ser 
855 

Ser Ala Ser Arg 
870 

Ser Ser Ser Ser 
885 



Val Asn Gly Glu 
585 

Glu Phe Glu Val 

Leu Leu Pro Pro 
620 

Arg Leu Pro Ser 
635 

Tyr Leu His He 
650 

Thr Glu Leu Pro 
665 

He Gly Asp Cys 

Val Val Pro Gin 
700 

Ala Arg Phe Leu 
715 

Asp Ser Arg Lys 
730 

Gin Asp Thr Thr 
745 

Val Val Val Glu 

Leu Thr Ala Lys 
780 

Lys Gin Arg Leu 
795 

Gin Ala He Cys 
810 

Met Thr Ser Ser 
825 

Ala Pro Gly Ser 

Gin Gly Arg Pro 
860 

Asn Ala Gly Met 
875 

Glu Arg Arg Ser 
890 



Asn Leu Phe Leu 
590 

Asn His Val Asn 
605 

Ala Ala Thr Pro 

Gly Phe Glu Glu 
640 

Arg Lys Leu Glu 
655 

Val Arg Val Leu 
670 

He Asn Leu His 
685 

Gin Leu Cys Ser 

Val Thr Asp Arg 
720 

Ala Gly Trp Ala 
735 

He Asn Gly Asp 
750 

Gly Gin Pro Ser 
765 

Phe He Phe Asp 

Thr Lys Gly Arg 
800 

Ser Ala Leu Gly 
815 

Pro Arg Met Asn 

830 

Val Arg Lys Thr 
845 

Gly His Tyr Ser 

He Pro Asp Asp 
880 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 355 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



Met Ala Glu Lys Ala Glu Asn Leu Pro Ser Ser Ser Ala Glu Ala Ser 
15 10 15 
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Glu Glu Pro Ser 
20 

lie Leu Val Leu 

35 

Arg Leu Thr Ala 
50 

Asn Leu Asp Pro 
65 

lie Arg Asp Thr 

Gly Pro Asn Gly 
100 

Phe Asp Lys Val 
115 

Val Cys Leu Leu 
130 

Ala Ser Gly Ser 
145 

Val Val Met Tyr 

Phe Met Ser Asn 
180 

Leu Pro Phe lie 
195 

Phe Ala Leu Lys 
210 

Glu Asp Ala Arg 
225 

Leu Val Leu Asp 

Ser Ala Thr Gly 
260 

Ser Val Glu Ala 
275 

Leu Ala Glu Lys 
290 

Glu Glu Thr Leu 
305 

Asn Pro Asp Glu 

lie His Leu Gly 
340 

Glu Arg Ser 
355 



Pro Gin Thr Gly 

Gly Met Ala Gly 
40 

Phe Leu His Ala 
55 

Ala Val Ser Lys 
70 

Val Lys Tyr Lys 
85 

Ala lie Met Thr 

lie Glu Leu lie 
120 

Asp Thr Pro Gly 
135 

lie lie Thr Asp 
150 

lie Val Asp Ser 
165 

Met Leu Tyr Ala 

Val Val Phe Asn 
200 

Trp Met Gin Asp 
215 

Ser Ser Tyr Met 
230 

Glu Phe Tyr Cys 
245 

Glu Gly Phe Glu 

Tyr Lys Lys Glu 
280 

Lys Leu Leu Asp 
295 

Lys Gly Lys Ala 
310 

Phe Leu Glu Ser 
325 

Gly Val Asp Glu 



Pro Asn Val Asn 
25 

Ser Gly Lys Thr 

Arg Lys Thr Pro 
60 

Val Pro Tyr Pro 
75 

Glu Val Met Lys 
90 

Cys Leu Asn Leu 
105 

Asn Lys Arg Ser 

Gin lie Glu Ala 
140 

Ser Leu Ala Ser 
155 

Ala Arg Ala Thr 
170 

Cys Ser lie Leu 
185 

Lys Ala Asp lie 

Phe Glu Arg Phe 
220 

Asn Asp Leu Ser 
235 

Gly Leu Lys Thr 
250 

Asp Val Met Thr 
265 

Tyr Val Pro Met 

Glu Glu Glu Arg 
300 

Val His Asp Leu 
315 

Glu Leu Asn Ser 
330 

Glu Asn Glu Glu 
345 



Gin Lys Pro Ser 
30 

Thr Phe Val Gin 
45 

Pro Tyr Val lie 

Val Asn Val Asp 
80 

Glu Phe Gly Met 
95 

Met Cys Thr Arg 
110 

Ser Asp Phe Ser 
125 

Phe Thr Trp Ser 

Ser His Pro Thr 
160 

Asn Pro Thr Thr 
175 

Tyr Arg Thr Lys 
190 

Val Lys Pro Thr 
205 

Asp Glu Ala Leu 

Arg Ser Leu Ser 
240 

Val Cys Val Ser 
255 

Ala lie Asp Glu 
270 

Tyr Glu Lys Val 
285 

Lys Lys Arg Asp 

Asn Lys Val Ala 
320 

Lys lie Asp Arg 
335 

Asp Ala Glu Leu 
350 



(2) INFORMATION" FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



Met 


Ser 


Glu 


Lys 


Thr 


Phe 


His 


Lys 


Ala 


Gin 


Thr 


He 


Arg 


Ala 


Lys 


Ala 


1 








5 










10 










15 




Ser 


Gly Val 


Pro 


Ser 


He 


Val 


Glu 


Ala 


Val 


Gin 


Phe 


His 


Gly 


Val 


Arg 








20 










25 










30 






He 


Thr 


Lys 


Asn 


Asp 


Ala 


Leu 


Val 


Lys 


Glu 


Val 


Ser 


Glu 


Leu 


Tyr 


Arg 






35 










40 










45 








Ser 


Lys 


Asn 


Leu 


Asp 


Glu 


Leu 


Val 


His 


Asn 


Ser 


His 


Leu 


Ala 


Ala 


Arg 




50 










55 










60 










His 


Leu 


Gin 


Glu 


Val 


Gly 


Leu 


Met 


Asp 


Asn 


Ala 


Val 


Ala 


Leu 


He 


Asp 


65 










70 










75 










80 


Thr 


Ser 


Pro 


Ser 


Ser 


Asn 


Glu 


Gly 


Tyr 


Val 


Val 


Asn 


Phe 


Leu 


Val 


Arg 










85 










90 










95 




Glu 


Pro 


Lys 


Ser 


Phe 


Thr 


Ala 


Gly 


Val 


Lys 


Ala 


Gly 


Val 


Ser 


Thr 


Asn 








100 










105 










110 






Gly Asp 


Ala 


Asp 


Val 


Ser 


Leu 


Asn 


Ala 


Gly 


Lys 


Gin 


Ser 


Val 


Gly 


Gly 






115 










120 










125 








Arg 


Gly 


Glu 


Ala 


He 


Asn 


Thr 


Gin 


Tyr 


Thr 


Tyr 


Thr 


Val 


Lys 


Gly 


Asp 

XT 




130 










135 










140 










His 


Cys 


Phe 


Asn 


He 


Ser 


Ala 


He 


Lys 


Pro 


Phe 


Leu 


Gly 


Trp 


Gin 


Lys 


145 










150 










155 










160 


Tyr 


Ser 


Asn 


Val 


Ser 


Ala 


Thr 


Leu 


Tyr 


Arg 


Ser 


Leu 


Ala 


His 


Met 


Pro 










165 










170 










175 




Trp 


Asn 


Gin 


Ser 


Asp 


Val 


Asp 


Glu 


Asn 


Ala 


Ala 


Val 


Leu 


Ala 


Tyr 


Asn 








180 










185 










190 






Gly 


Gin 


Leu 


Trp 


Asn 


Gin 


Lys 


Leu 


Leu 


His 


Gin 


Val 


Lys 


Leu 


Asn 


Ala 






195 










200 










205 








He 


Trp 


Arg 


Thr 


Leu 


Arg 


Ala 


Thr 


Arg 


Asp 


Ala 


Ala 


Phe 


Ser 


Val 


Arq 




210 










215 










220 










Glu 


Gin 


Ala 


Gly 


His 


Thr 


Leu 


Lys 


Phe 


Ser 


Leu 


Glu 


Asn 


Ala 


Val 


Ala 


225 










230 










235 










240 


Val 


Asp 


Thr 


Arg 


Asp 


Arg 


Pro 


He 


Leu 


Ala 


Ser 


Arg 


Gly 


He 


Leu 


Ala 










245 










250 










255 




Arg 


Phe 


Ala 


Gin 


Glu 


Tyr 


Ala 


Gly 


Val 


Phe 


Gly 


Asp 


Ala 


Ser 


Phe 


Val 








260 










265 










270 






Lys 


Asn 


Thr 


Leu 


Asp 


Leu 


Gin 


Ala 


Ala 


Ala 


Pro 


Leu 


Pro 


Leu 


Gly 


Phe 






275 










280 










285 








He 


Leu 


Ala 


Ala 


Ser 


Phe 


Gin 


Ala 


Lys 


His 


Leu 


Lys 


Gly 


Leu 


Gly 


Asp 




290 










295 










300 










Arg 


Glu 


Val 


His 


He 


Leu 


Asp 


Arg 


Cys 


Tyr 


Leu 


Gly 


Gly 


Gin 


Gin 


Asp 


305 










310 










315 










320 


Val 


Arg 


Gly 


Phe 


Gly 


Leu 


Asn 


Thr 


He 


Gly 


Val 


Lys 


Ala 


Asp 


Asn 


Ser 










325 










330 










335 




Cys 


Leu 


Gly 


Gly 


Gly 


Ala 


Ser 


Leu 


Ala 


Gly 


Val 


Val 


His 


Leu 


Tyr 


Arg 








340 










345 










350 






Pro 


Leu 


He 


Pro 


Pro 


Asn 


Met 


Leu 


Phe 


Ala 


His 


Ala 


Phe 


Leu 


Ala 


Ser 






355 










360 










365 








Gly 


Ser 


Val 


Ala 


Ser 


Val 


His 


Ser 


Lys 


Asn 


Leu 


Val 


Gin 


Gin 


Leu 


Gin 




370 










375 










380 










Asp 


Thr 


Gin 


Arg 


Val 


Ser 


Ala 


Gly 


Phe 


Gly 


Leu 


Ala 


Phe 


Val 


Phe 


Lys 


385 










390 










395 










400 


Ser 


He 


Phe 


Arg 


Leu 


Glu 


Leu 


Asn 


Tyr 


Thr 


Tyr 


Pro 


Leu 


Lys 


Tyr 


Val 










405 










410 










415 




Leu 


Gly Asp 


Ser 


Leu 


Leu 


Gly 


Gly 


Phe 


His 


He 


Gly 


Ala 


Gly 


Val 


Asn 








420 










425 










430 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 198 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 



Met 


Leu 


Tyr 


lie 


Leu 


Trr> 


Lys 


Leu 


Asn 


Tvr 


Leu 


Gin 


Lys 


Lys 


Met 


Ser 


1 








5 










10 














Leu 


Arq 


Lys 
i 


lie 


Asn 


Phe 


Val 


Thr 


Gly 


Asn 


Val 


Lys 


Lys 


Leu 


Glu 


Glu 








20 










25 










30 






Val 


Lys 


Ala 


lie 


Leu 


Lys 


Asn 


Phe 


Glu 


Val 


Ser 


Asn 


Val 


Asp 


Val 


Asp 






35 










40 










45 








Leu 


Asp 


Glu 


Phe 


Gin 


Gly 


Glu 


Pro 


Glu 


Phe 


He 


Ala 


Glu 


Arg 


Lys 


Cys 




50 










55 










60 










Arg 


Glu 


Ala 


Val 


Glu 


Ala 


Val 


Lys 


Gly 


Pro 


Val 


Leu 


Val 


Glu 


Asp 


Thr 


65 










70 










75 










80 


Ser 


Leu 


Cys 


Phe 


Asn 


Ala 


Met 


Gly 


Gly 


Leu 


Pro 


Gly 


Pro 


Tyr 


He 


Lys 










85 










90 










95 




Trp 


Phe 


Leu 


Lys 


Asn 


Leu 


Lys 


Pro 


Glu 


Gly 


Leu 


His 


Asn 


Met 


Leu 


Ala 








100 










105 










110 






Gly 


Phe 


Ser 


Asp 


Lys 


Thr 


Ala 


Tyr 


Ala 


Gin 


Cys 


He 


Phe 


Ala 


Tyr 


Thr 






115 










120 










125 








Glu 


Gly 


Leu 


Gly 


Lys 


Pro 


He 


His 


Val 


Phe 


Ala Gly 


Lys 


Cys 


Pro 


Gly 




130 










135 










140 










Gin 


lie 


Val 


Ala 


Pro 


Arg 


Gly 


Asp 


Thr 


Ala 


Phe 


Gly 


Trp 


Asp 


Pro 


Cys 


145 










150 










155 










160 


Phe 


Gin 


Pro 


Asp 


Gly 


Phe 


Lys 


Glu 


Thr 


Phe 


Gly 


Glu 


Met 


Asp 


Lys 


Asp 










165 










170 










175 




Val 


Lys 


Asn 


Glu 


He 


Ser 


His 


Arg 


Ala 


Lys 


Ala 


Leu 


Glu 


Leu 


Leu 


Lys 








180 










185 










190 






Glu 


Tyr 


Phe 


Gin 


Asn 


Asn 























195 



(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
CGAACACTTT ATATTTCTCG 
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(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
GATAGTTCCC TTCGTTCGGG 2 0 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTTCTGGATT TTAACCTTCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TTTCCGAGAA GTCACGTTGG 2 0 



(2) INFORMATION FOR SEQ ID NO : 12 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 



TACAGGAATT TTTGAACGGG 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 
CTTCAGATGA CGTGGATTCC 



(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 14 
GGAATCCGAA AAAGTGAACT 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE : nucleic acid 
(O STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 15 
AAGAGATACA CTCAATGGGG 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 16 



ATCGATACCA CCGTCTCTGG 
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(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
TTGAATCTAC ACTAATCACC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
CCAATTATCT TTTCCAGTCA 



(2) INFORMATION FOR SEQ ID NO : 19 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
ACATTATAAA GTTACTGTCC 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
TTTTAGTTAA AGCATTGACC 
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(2) INFORMATION FOR SEQ ID NO : 2 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ACATCTTTAT CCATTTCTCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
TGCAAAGGCT CTGGAACTCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
AAAAAC C ACT TGATATAAGG 2 0 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CATCCAAAAG CAGTATCACC 



20 
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(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
TTAATTGGAT GCAAGCACCC C 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
ATTACTATAC GAACATTTCC 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
TTGTAAAGGC GTTAGTTTGG 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 
CAGGAGTATT TGGTGATGCG 
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(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



(2) INFORMATION FOR SEQ ID NO : 3 0 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



(2) INFORMATION FOR SEQ ID NO : 3 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ingle 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 



CGACGGGGAG AAGGTGACGG 
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AAAACTTCTA CCAACAATGG 
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CGTAATCTCT CTCGATTAGC 
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CCGTGGGATG GCTACTTGCC 



20 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
TGGATTTGTG GCACGAGCGG 2 0 



(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
TTGATTGCCT CTCCTCGTCC 2 0 



(2) INFORMATION FOR SEQ ID NO : 3 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
ATCAACATCT GATTGATTCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CAGCGAGCGC ATGCAACTAT ATATTGAGCA GG 



32 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
AATAAATATT TAAATATTCA GATATACCCT GAACTCTACA G 41 



(2) INFORMATION FOR SEQ ID NO : 3 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 
AAACTGTAGA GTTCAGGGTA TATCTGAATA TTTAAATATT TATTC 4 5 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GTACGTGGAG CTCTGCAACT ATATATTGAG CAGG 34 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
ATGACACTGC AGGATAGTTC CCTTCGTTCG GG 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GTGTTGCATC AGTTCATTCC 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GCTGTGCTAG AAGTCAGAGG 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GTTCTCCTTG GAATTCAT C C 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGTATAT CTA GATGTGCGAG TCTCTGCCAA TT 
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(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
AGTAATTGTA CATTTAGTGG 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
ATTAACCTTA CTTACTTACC 



(2) INFORMATION FOR SEQ ID NO:47: 

( i ) S EQUENCE CHARACTER I S T I CS : 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
CTAAACTAAG TAATATAACC 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GTTGATTCTT TGAGCACTGG 
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{2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
AATTCGACCA ATTACATTGG 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 
AACATAGTTG TTGAGGAAGG 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 
AATTAATGGA GATT CTACGG 



(2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 
TCAGCATCTA GAAATGCAGG 
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(2) INFORMATION FOR SEQ ID NO:53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
CGAATGTCAA CATTCACTGG 2 0 



(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 54 : 
CTTAACCTGA TGTGTACTCG 2 0 



(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
ATGAAGCTTT AGAGGATGCC 2 0 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CGACGAATTT CTGGAGTCGG 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 
ACTGCATTAT CCATTAATCC 



(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
CACCCAAATA ACATCTATCC 



(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59 
TTTAACCTCA TCTTCGCTGG 



(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60 



ATGTTCCGCA AGCTTGGTTC 
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(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61 
TTTAATTACC CAAGTTTGAG 



(2) INFORMATION FOR SEQ ID NO : 62 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62 
TTTTAACCCA GTTACTCAAG 



